Self-supervised learning for few-shot image classification

Self-supervised learning for few-shot image classification

Abstract

Few-shot image classification aims to classify unseen classes with limited labeled samples. Recent works benefit from the meta-learning process with episodic tasks and can fast adapt to class from training to testing. Due to the limited number of samples for each task, the initial embedding network for meta learning becomes an essential component and can largely affects the performance in practice. To this end, many pre-trained methods have been proposed, and most of them are trained in supervised way with limited transfer ability for unseen classes. In this paper, we proposed to train a more generalized embedding network with self-supervised learning (SSL) which can provide slow and robust representation for downstream tasks by learning from the data itself. We evaluate our work by extensive comparisons with previous baseline methods on two few-shot classification datasets (i.e., MiniImageNet and CUB). Based on the evaluation results, the proposed method achieves significantly better performance, i.e., improve 1-shot and 5-shot tasks by nearly 3% and 4% on MiniImageNet, by nearly 9% and 3% on CUB. Moreover, the proposed method can gain the improvement of (15%, 13%) on MiniImageNet and (15%, 8%) on CUB by pretraining using more unlabeled data. Our code will be available at https://github.com/phecy/ssl-few-shot.

\name

Da Chen, Yuefeng Chen, Yuhong Li, Feng Mao, Yuan He, Hui Xuethanks: Equal Contribution. \addressAlibaba Group, China
{chen.cd, yuefeng.chenyf,daniel.lyh,maofeng.mf,heyuan.hy,hui.xueh@alibaba-inc.com} \ninept {keywords} Self-supervised learning, Few-shot learning, Embedding network, Metric learning, Image classification

1 Introduction

Recent advances of deep learning techniques have made significant progresses in many areas. The main reason for such success is the ability to train a deep model that can retain profound knowledge from large scale labeled dataset [6, 9]. This is somehow against the human learning behavior - one can easily classify objects from just a few examples with limited prior knowledge. How to computationally model such behavior motivates the recent researches in few-shot learning, where the focus is how to adapt the model to new data or tasks with restricted number of instances.

One popular solution for few-shot classification is to apply a meta-learning process, in which the dataset is divided into subsets for different meta tasks, in order to learn how to adapt the model according to the task change. The main challenge here is that the meta-learning could easily lead to over fitting, as only a few samples for each class are available. To solve this problem, an attention mechanism that achieves a good classification for the unlabeled samples by learning an embedding of the labeled ones has been proposed [30]. By sub-sampling classes and the associated data therein to simulate few-shot tasks [23, 26, 28], the so called episodes can further benefit meta-learning by constructing a probability model to predict a decision boundary between classes. Recent meta-learning methods [24, 22] focus on retrieving transferable embedding from the dataset along with the relation between images and their class descriptions. This is done by decomposing the training into two stages, i.e., i). learning a robust and transferable embedding, and ii). fine-tuning the learned embedding for downstream classification task. All these works demonstrate that a robust pre-trained embedding network is essential to the performance of the few-shot image classification task.

Current methods [13, 21, 22, 25] with good performance mostly apply a ResNet12 [10] or a wide ResNet [32] as the embedding network and surpass the methods [2, 4] with deeper network. We argue that the abandon of large network are mainly because that all these methods are trained in a supervised way with limited labeled samples. In this paper, we propose to apply a much more larger embedding network with self-supervised learning (SSL) to incorporate with episodic task based meta-learning. According to the evaluation presented in Section 4, the proposed method can significantly improve few-shot image classification performance over baseline methods in two common datasets. As a remark, under the same experiemnt setting, the proposed method improves 1-shot and 5-shot tasks by nearly 3% and 4% on MiniImageNet, by nearly 9% and 3% on CUB. Moreover, the proposed method can gain the improvement of 15%, 13% and 15%, 8% in two tasks on MiniImageNet and CUB dataset by pretraining using more unlabeled data. We also observe that self-supervised learning pre-train model can be robustly transferred to other dataset.

Figure 1: The overall architecture of our approach. LEFT: Train embedding network by Self-Supervised learning (AMDIM). The pretext task is designed to maximize the mutual information between two views generated from the same image by data augmentation. Right: Meta-learning with episodic task(3-way, 1-shot example). For each task, the training samples and query samples are encoded by the embedding network. Query sample embeddings are compared with centroid of training sample embeddings and make further prediction.

2 Related Work

Few-shot learning as an active research topic has been extensively studied. In this paper, we will primarily review recent deep-learning based approaches that are more relevant to our work. A number of works aim to improve the robustness of the training process. Garcia et al. [8] propose a graph neural network according to the generic message passing inference method. Zhao et al. [33] split the features to three orthogonal parts to improve the classification performance for few-short learning, allowing simultaneous feature selection and dense estimation. Chen et al. [4] propose a Self-Jig algorithm to augment the input data in few-shot learning by synthesizing new images that are either labeled or unlabeled.

A popular strategy for few-shot learning is through meta-learning (also called learning-to-learn) with multi-auxiliary tasks [30, 7, 28, 22, 13]. The key is how to robustly accelerate the learning progress of the network without suffering from over-fitting with limited training data. Finn et al. propose MAML [7] to search the best initial weights through gradient decent for network training, making the fine-tuning easier. REPTILE [19] simplifies the complex computation of MAML by incorporating an loss, but still performs in high dimension space. To reduce the complexity, Rusu et al. propose a network called LEO [25] to learn a low dimension latent embedding of the model. CAML [13] extends MAML by partitioning the model parameters into context parameters and shared parameters, enabling a bigger network without over-fitting. Another stream of meta-learning based approaches [26, 21, 30, 28] attempt to learn a deep embedding model that can effectively project the input samples to a specific feature space. Then the samples can be classified by the nearest neighbour (NN) criterion using a distance function such as Cosine distance or Euclidean distance, etc. Koch et al. [14] propose the Siamese network to extract embedding features from input images and converge images from the same class. Matching Network [30] utilizes an augment neural network for feature embedding, forming the basis for metric learning.

Self-supervised learning(SSL) aims to learn robust representations from the data itself without class labels. The main challenge here is how to design the pretext tasks that are complex enough to exploit high-level compact semantic visual representations that are useful for solving downstream tasks. This is consistent with the mission of the pre-trained embedding network in few shot learning. The work of [15] revisited some state-of-the art methods based on various classification based pretext tasks(e.g., Rotation, Exemplar, RelPatchLoc, Jigsaw). Recently, by maximizing mutual information between features extracted from multiple views of a shared context [20, 11, 29, 1], SSL archive comparable performance to supervised learning. Among them, Contrastive Predictive Coding [20, 11] learns from two views (the past and future) and is applicable to sequential data. Contrastive Multiview Coding [29] extends the framework to learn representations from multiple views of a dataset. AMDIM [1] learns features extracted from multiples views which produced by repeatedly applying data augmentation on the input image, achieve 68.1% accuracy on ImageNet.

3 Method

Few-shot learning is a challenging problem as it has only limited data for training and need to verify the performance on the data for unseen classes. One popular solution for few-shot learning classification problem is to apply a meta learning on top of a pre-trained embedding network. Most of current methods are mainly focusing on the second stage i.e., meta learning stage. In this work, we follow this two stages paradigm but utilize self-supervised learning to train a large embedding network as our strong base.

3.1 Self-supervised learning stage

Our goal is to learn representations that enhance the feature’s generalization. In our approach, we use Augmented Multiscale Deep InfoMax (AMDIM) [1] as our self-supervised model. The pretext task is designed to maximize the mutual information between features extracted from multiple views of a shared context.

The mutual information (MI) measures the shared information between two random variables and which is defined as the Kullback–Leibler (KL) divergence between the joint and the product of the marginals.

(1)

where is the joint distribution and and are the marginal distributions of and . Estimating MI is challenging as we just have samples but not direct access to the underlying distribution.  [20] proved that we can maximizes a lower bound on mutual information by minimizing the Noise Contrastive Estimation (NCE) loss based on negative sampling.

The core concept of AMDIM is to maximize mutual information between global features and local features from two views of the same image. Specifically, maximize mutual information between , and . Where is the global feature, is encoder’s local feature map as well as as the encoder’s feature map. For example, the NCE loss between and is defined as below:

(2)

are the negative samples of image , is the distance metric function. At last, the overall loss between and is as follows:

(3)

The stage as shown in the Figure 1 gives a overview of the AMDIM self supervised learning method. The red and blue lines shows the local and global feature between two view and . The detail of encoder network is defined in Table  1.

 

Layers Output Size ConvBlocks
( kernel, output_channels, stride, pad )

 

conv1 62 62
conv2_x 30 30
conv3_x 14 14
conv4_x 7 7
conv5_x 5 5
conv6_x 5 5
conv7 1 1
# params. 198M(ndf=192, nrkhs=1536, ndepth=8)
# FLOPs. 10.96 GFLOPs (ndf=192, nrkhs=1536, ndepth=8)

 

Table 1: Model Architecture of AmdimNet. ndf is the output channels’ parameter of the network. ndepth controls the model’s depth. nrkhs is the embedding dimension. Each convolution blocks in conv2_x to conv6_x contains 2 ndepth convolution layers.

3.2 Meta-learning stage

Given an embedding network, meta-learning is applied to fine-tune it to fit the class changes requirement of few-shot classification. A typical meta learning can be considered as a -way -shot episodic classification problem with multi-tasks [30]. For each classification task , we have classes with samples from each class. The entire training dataset can be presented by where is the total number of classes in . For a specific task , denotes the class labels associated therein. Here is the number of classes in support set for a single training task. The support set and query set can often be randomly selected from : (a) the support set for task is denoted by , where (-way -shot); (b) the query set is where is the number of samples selected for meta testing.

As mentioned in Section 2, the recent popular frameworks such as Snell et al. [26], are able to learn an embedding function to map all input samples to a mean vector in a description space to represent each class. For class , it is represented by the centroid of embedding features of training samples and can be obtained as:

(4)

where is the embedding function.

As a metric learning based method, we employ a distance function and produce a distribution over all classes given a query sample from query set :

(5)

In this paper, Euclidean distance is chosen as distance function . As shown in Eq. 5, the distribution is based on a softmax over distance between the embedding of the samples (in the query set) and the reconstructed features of the class. The loss in meta learning stage can then read:

(6)

4 Experimental Results

In this section, we first introduce the dataset and training process used in our evaluation, then show quantitative comparisons against other baseline methods, finally we conduct a detailed study to validate the transfer ability of our approach.

4.1 Datasets

MiniImageNet dataset, as proposed in [30], is a benchmark to evaluate the performance of few-shot learning methods. This dataset is a subset randomly selected from ImageNet. MiniImageNet contains 60,000 images from only 100 classes, and each class has 600 images. We follow the data split strategy in [23] to sample images of 64 classes for training, 16 classes for validation, 20 classes for test.

Caltech-UCSD Birds-200-2011(CUB-200-2011) dataset, proposed in [31], is a dataset for fine-grained classification. The CUB-200-2011 dataset contains 200 classes of birds with 11788 images in total. For evaluation, we follow the split in [12]. 200 species of birds are randomly split to 100 classes for training, 50 classes for validation, and 50 classes for test.

4.2 Training Details

Several recent works show that a typical training process can include a pre-trained network [22, 25] or employ co-training [21] for feature embedding. This can significantly improve the classification accuracy. In this paper, we adopt the AMDIM  [1] SSL training framework to pre-train the feature embedding network. AmdimNet(ndf=192, ndepth=8, nrkhs=1536) is used for all datasets and the embedding dimension is 1536. Adam is chosen as the optimizer with a learning rate of . We use as the input resolution among these datasets. For MiniImageNet dataset, 3 embedding models are trained. Mini80-SSL is self-supervised trained from 48,000 images (80 classes training and validation ) without labels. Mini80-SL is supervised training using same AmdimNet by cross entropy loss with labels. Image900-SSL is SSL trained from all images from ImageNet1K except MiniImageNet. For CUB dataset, CUB150-SSL is trained by SSL from 150 classes (training and validation). CUB150-SL is the supervised trained model. Image1K-SSL is SSL trained from all images from ImageNet1K without label.

 

Baselines Embedding Net 1-Shot 5-Way 5-Shot 5-Way

 

MatchingNet [30] 4 Conv 43.56 0.84% 55.31 0.73%
MAML [7] 4 Conv 48.70 1.84% 63.11 0.92%
RelationNet [28] 4 Conv 50.44 0.82% 65.32 0.70%
REPTILE [19] 4 Conv 49.97 0.32% 65.99 0.58%
ProtoNet [26] 4 Conv 49.42 0.78% 68.20 0.66%
Baseline* [3] 4 Conv 41.08 0.70% 54.50% 0.66
Spot&learn [5] 4 Conv 51.03 0.78% 67.96% 0.71
DN4 [16] 4 Conv 51.24 0.74% 71.02% 0.64
SNAIL [18] ResNet12 55.71 0.99% 68.88 0.92%
ProtoNet [26] ResNet12 56.50 0.40% 74.2 0.20%
CAML [13] ResNet12 59.23 0.99% 72.35 0.71%
TPN [17] ResNet12 59.46% 75.65%
MTL [27] ResNet12 61.20 1.8% 75.50 0.8%
DN4 [16] ResNet12 54.37 0.36% 74.44 0.29%
TADAM [21] ResNet12 58.50% 76.70%
Qiao-WRN [22] Wide-ResNet28 59.60 0.41% 73.74 0.19%
LEO [25] Wide-ResNet28 61.76 0.08% 77.59 0.12%
Dis. k-shot [2] ResNet34 56.30 0.40% 73.90 0.30%
Self-Jig(SVM) [4] ResNet50 58.80 1.36% 76.71 0.72%

 

Ours_Mini80_SL AmdimNet 43.92 0.19% 67.13 0.16%
Ours_Mini80_SSL AmdimNet 46.13 0.17% 70.14 0.15%
Ours_Mini80_SSL AmdimNet 64.03 0.20% 81.15 0.14%
Ours_Image900_SSL AmdimNet 76.82 0.19% 90.98 0.10%

 

Table 2: Few-shot classification accuracy results on MiniImageNet on 1-shot 5-way, 5-shot 5-way and 10-shot 5-way tasks. All accuracy results are reported with 95% confidence intervals. indicates result without meta-learning.

4.3 Quantitative comparison

For MiniImageNet dataset, we evaluate our method in two common few-shot learning tasks i.e., 1-shot 5-way task and 5-shot 5-way task against 18 baseline methods with different embedding networks including classical ones [26, 30, 7] and recently proposed methods [25, 21, 17]. For CUB dataset, we follow the recent work [3] to evaluate the robustness of the proposed framework with 7 other alternatives on this fine-grained dataset.

As detailed in Table 2, the proposed method outperforms all baselines in the tested tasks. In 1-shot 5-way test, our approach achieves and improvement over ProtoNet [26] and LEO [25] respectively. The former is an amended variant of ProtoNet using pre-trained Resnet as embedding network and has same meta-learning stage with the proposed method, the later is the state-of-the-art method. In the experience for 5-Shot 5-Way, we observe a similar improvement in accuracy. Furthermore, we observe that the performance of our proposed method significantly increases when receiving more images/classes as input for pretrain. It gives improvement on 5-shot 5-ways test against for 1-Shot 5-Way.

Table 3 illustrates our experiment on CUB dataset. Our proposed method yields highest accuracy from all trials. In the 1-shot 5-way test, we have gaining a margin of increment to the classic ProtoNet [26]. The improvement is more significant for 5-shot 5-way test. Our proposed method results is which introduces improvement to DN4-Da [16]. Comparing to Baseline++ [3], our method shows a significant improvement, i.e., and in both tests.

 

Baselines Embedding Net 1-Shot 5-Way 5-Shot 5-Way

 

MatchingNet [30] 4 Conv 61.16 0.89 72.86 0.70
MAML [7] 4 Conv 55.92 0.95% 72.09 0.76%
ProtoNet [26] 4 Conv 51.31 0.91% 70.77 0.69%
MACO [12] 4 Conv 60.76% 74.96%
RelationNet [28] 4 Conv 62.45 0.98% 76.11 0.69%
Baseline++ [3] 4 Conv 60.53 0.83% 79.34 0.61%
DN4-DA [16] 4 Conv 53.15 0.84% 81.90 0.60%

 

Ours_CUB150_SL AmdimNet 45.10 0.21% 74.59 0.16%
Ours_CUB150_SSL AmdimNet 40.83 0.16% 65.27 0.18%
Ours_CUB150_SSL AmdimNet 71.85 0.22% 84.29 0.15%
Ours_Image1K_SSL AmdimNet 77.09 0.21% 89.18 0.13%

 

Table 3: Few-shot classification accuracy results on CUB dataset [31] on 1-shot 5-way task, 5-shot 5-way task. All accuracy results are reported with 95% confidence intervals. indicates result without meta-learning. For each task, the best-performing method is highlighted.

4.4 Ablation Study

As shown in the quantitative evaluation, the proposed method can significantly improve the performance in few-shot classification task by self-supervised pretrain using a large network. One concern may be raised is that if the gain of improvements of proposed network is simply due to the increment of network’s capacity. To prove the effectiveness of the proposed method, we train the embedding network with labeled data (Mini80-SL and CUB150-SL as detailed in Section 4.2). As shown in Table 2 and Table 3, it performs even worse than the methods with simple 4 Conv blocks embedding networks as such big network under supervised learning with limited data can cause overfitting problem and cannot adjust to new unseen classes during testing. However, with SSL based pre-training a more generalized embedding network can be obtained and improve the results significantly. One may also concern about the effectiveness of the meta learning fine-tuning in the second stage. To test this, the pre-train embedding network is directly applied on the task with nearest neighbourhood(NN) classification. As shown in the test results on both dataset, meta-learning can effectively fine-tune the embedding network and achieve remarkable improvement.

We also include more data without labels during SSL pre-training and observe an more significant improvement of the result. As shown in Table 2, the proposed method can gain the improvement of 15% and 13% in two test tasks. As detailed analyzed in [3], current few-shot learning methods can not efficiently transfer the domain of learning, i.e., the training domain can not have huge gap with the testing set. In this paper, a transferability test is also conducted by pre-training the embedding network on ImageNet and applied on CUB dataset. As shown in Table 3, the proposed method with ImageNet pre-trained embedding network can be efficiently transferred to CUB dataset and gain an improvement of 15%,8% in both test tasks.

5 Conclusion

In this paper, we propose to utilizes self-supervised learning to efficiently train a robust embedding network for few-shot image classification. The resulted embedding network is more generalized and more transferable comparing to other baselines. After fine-tuning by meta-learning process, the performance of the proposed method can significantly outperform all baselines based on the quantitative results using two common few-shot classification datasets. The current framework can be extended in several ways in the future. For instance, one direction is to combine these two stage together and develop an end-to-end method for this task. Another direction is to investigate the effectiveness of the proposed method on another few-shot tasks such as few-shot detection, etc.

References

  • [1] P. Bachman, R. D. Hjelm, and W. Buchwalter (2019) Learning representations by maximizing mutual information across views. arXiv preprint arXiv:1906.00910. Cited by: §2, §3.1, §4.2.
  • [2] M. Bauer, M. Rojas-Carulla, J. B. Swiatkowski, B. Scholkopf, and R. E. Turner (2017) Discriminative k-shot learning using probabilistic models. arXiv preprint arXiv:1706.00326. Cited by: §1, Table 2.
  • [3] W. Chen, Y. Liu, Z. Kira, Y. F. Wang, and J. Huang (2019) A closer look at few-shot classification. In ICLR, External Links: Link Cited by: §4.3, §4.3, §4.4, Table 2, Table 3.
  • [4] Z. Chen, Y. Fu, K. Chen, and Y. Jiang (2019) Image block augmentation for one-shot learning.. In AAAI, Cited by: §1, §2, Table 2.
  • [5] W. Chu, Y. Li, J. Chang, and Y. F. Wang (2019) Spot and learn: a maximum-entropy patch sampler for few-shot image classification. In CVPR, pp. 6251–6260. Cited by: Table 2.
  • [6] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In CVPR, pp. 248–255. Cited by: §1.
  • [7] C. Finn, P. Abbeel, and S. Levine (2017) Model-agnostic meta-learning for fast adaptation of deep networks. In ICML, pp. 1126–1135. Cited by: §2, §4.3, Table 2, Table 3.
  • [8] V. Garcia and J. Bruna (2017) Few-shot learning with graph neural networks. arXiv preprint arXiv:1711.04043. Cited by: §2.
  • [9] J. F. Gemmeke, D. P. W. Ellis, D. Freedman, A. Jansen, W. Lawrence, R. C. Moore, M. Plakal, and M. Ritter (2017) Audio set: an ontology and human-labeled dataset for audio events. In Proc. IEEE ICASSP 2017, New Orleans, LA. Cited by: §1.
  • [10] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In CVPR, pp. 770–778. Cited by: §1.
  • [11] O. J. Hénaff, A. Razavi, C. Doersch, S. Eslami, and A. v. d. Oord (2019) Data-efficient image recognition with contrastive predictive coding. arXiv preprint arXiv:1905.09272. Cited by: §2.
  • [12] N. Hilliard, L. Phillips, S. Howland, A. Yankov, C. D. Corley, and N. O. Hodas (2018) Few-shot learning with metric-agnostic conditional embeddings. arXiv preprint arXiv:1802.04376. Cited by: §4.1, Table 3.
  • [13] X. Jiang, M. Havaei, F. Varno, G. Chartrand, N. Chapados, and S. Matwin (2019) Learning to learn with conditional class dependencies. In ICLR, External Links: Link Cited by: §1, §2, Table 2.
  • [14] G. Koch, R. Zemel, and R. Salakhutdinov (2015) Siamese neural networks for one-shot image recognition. In ICML Deep Learning Workshop, Vol. 2. Cited by: §2.
  • [15] A. Kolesnikov, X. Zhai, and L. Beyer (2019) Revisiting self-supervised visual representation learning. In CVPR, pp. 1920–1929. Cited by: §2.
  • [16] W. Li, L. Wang, J. Xu, J. Huo, Y. Gao, and J. Luo (2019) Revisiting local descriptor based image-to-class measure for few-shot learning. In CVPR, pp. 7260–7268. Cited by: §4.3, Table 2, Table 3.
  • [17] Y. Liu, J. Lee, M. Park, S. Kim, E. Yang, S. Hwang, and Y. Yang (2019) LEARNING TO PROPAGATE LABELS: TRANSDUCTIVE PROPAGATION NETWORK FOR FEW-SHOT LEARNING. In ICLR, External Links: Link Cited by: §4.3, Table 2.
  • [18] N. Mishra, M. Rohaninejad, X. Chen, and P. Abbeel (2017) A simple neural attentive meta-learner. arXiv preprint arXiv:1707.03141. Cited by: Table 2.
  • [19] A. Nichol and J. Schulman (2018) Reptile: a scalable metalearning algorithm. arXiv preprint arXiv:1803.02999 2. Cited by: §2, Table 2.
  • [20] A. v. d. Oord, Y. Li, and O. Vinyals (2018) Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748. Cited by: §2, §3.1.
  • [21] B. Oreshkin, P. R. López, and A. Lacoste (2018) TADAM: task dependent adaptive metric for improved few-shot learning. In NeurIPS, pp. 719–729. Cited by: §1, §2, §4.2, §4.3, Table 2.
  • [22] S. Qiao, C. Liu, W. Shen, and A. L. Yuille (2018) Few-shot image recognition by predicting parameters from activations. In CVPR, pp. 7229–7238. Cited by: §1, §1, §2, §4.2, Table 2.
  • [23] S. Ravi and H. Larochelle (2017) Optimization as a model for few-shot learning. In ICLR, Cited by: §1, §4.1.
  • [24] A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell (2018) Meta-learning with latent embedding optimization. arXiv preprint arXiv:1807.05960. Cited by: §1.
  • [25] A. A. Rusu, D. Rao, J. Sygnowski, O. Vinyals, R. Pascanu, S. Osindero, and R. Hadsell (2019) Meta-learning with latent embedding optimization. In ICLR, External Links: Link Cited by: §1, §2, §4.2, §4.3, §4.3, Table 2.
  • [26] J. Snell, K. Swersky, and R. Zemel (2017) Prototypical networks for few-shot learning. In NeurIPS, pp. 4077–4087. Cited by: §1, §2, §3.2, §4.3, §4.3, §4.3, Table 2, Table 3.
  • [27] Q. Sun, Y. Liu, T. Chua, and B. Schiele (2019) Meta-transfer learning for few-shot learning. In CVPR, pp. 403–412. Cited by: Table 2.
  • [28] F. Sung, Y. Yang, L. Zhang, T. Xiang, P. H. Torr, and T. M. Hospedales (2018) Learning to compare: relation network for few-shot learning. In CVPR, pp. 1199–1208. Cited by: §1, §2, Table 2, Table 3.
  • [29] Y. Tian, D. Krishnan, and P. Isola (2019) Contrastive multiview coding. arXiv preprint arXiv:1906.05849. Cited by: §2.
  • [30] O. Vinyals, C. Blundell, T. Lillicrap, D. Wierstra, et al. (2016) Matching networks for one shot learning. In NeurIPS, pp. 3630–3638. Cited by: §1, §2, §3.2, §4.1, §4.3, Table 2, Table 3.
  • [31] C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie (2011) The Caltech-UCSD Birds-200-2011 Dataset. Technical report Technical Report CNS-TR-2011-001, California Institute of Technology. Cited by: §4.1, Table 3.
  • [32] S. Zagoruyko and N. Komodakis (2016) Wide residual networks. arXiv preprint arXiv:1605.07146. Cited by: §1.
  • [33] B. Zhao, X. Sun, Y. Fu, Y. Yao, and Y. Wang (2018) MSplit lbi: realizing feature selection and dense estimation simultaneously in few-shot and zero-shot learning. arXiv preprint arXiv:1806.04360. Cited by: §2.
Comments 0
Request Comment
You are adding the first comment!
How to quickly get a good reply:
  • Give credit where it’s due by listing out the positive aspects of a paper before getting into which changes should be made.
  • Be specific in your critique, and provide supporting evidence with appropriate references to substantiate general statements.
  • Your comment should inspire ideas to flow and help the author improves the paper.

The better we are at sharing our knowledge with each other, the faster we move forward.
""
The feedback must be of minimum 40 characters and the title a minimum of 5 characters
   
Add comment
Cancel
Loading ...
398339
This is a comment super asjknd jkasnjk adsnkj
Upvote
Downvote
""
The feedback must be of minumum 40 characters
The feedback must be of minumum 40 characters
Submit
Cancel

You are asking your first question!
How to quickly get a good answer:
  • Keep your question short and to the point
  • Check for grammar or spelling errors.
  • Phrase it like a question
Test
Test description