The current project page provides pytorch code that implements the following CVPR2018 paper:
Title: "Dynamic Few-Shot Visual Learning without Forgetting"
Authors: Spyros Gidaris, Nikos Komodakis
Institution: Universite Paris Est, Ecole des Ponts ParisTech
Code: https://github.com/gidariss/FewShotWithoutForgetting
Arxiv: https://arxiv.org/abs/1804.09458
Abstract:
The human visual system has the remarkably ability to be able to effortlessly learn novel concepts from only a few examples. Mimicking the same behavior on machine learning vision systems is an interesting and very challenging research problem with many practical advantages on real world vision applications. In this context, the goal of our work is to devise a few-shot visual learning system that during test time it will be able to efficiently learn novel categories from only a few training data while at the same time it will not forget the initial categories on which it was trained (here called base categories). To achieve that goal we propose (a) to extend an object recognition system with an attention based few-shot classification weight generator, and (b) to redesign the classifier of a ConvNet model as the cosine similarity function between feature representations and classification weight vectors. The latter, apart from unifying the recognition of both novel and base categories, it also leads to feature representations that generalize better on unseen categories. We extensively evaluate our approach on Mini-ImageNet where we manage to improve the prior state-of-the-art on few-shot recognition (i.e., we achieve
If you find the code useful in your research, please consider citing our CVPR2018 paper:
@inproceedings{gidaris2018dynamic,
title={Dynamic Few-Shot Visual Learning without Forgetting},
author={Gidaris, Spyros and Komodakis, Nikos},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition},
pages={4367--4375},
year={2018}
}
It was developed and tested with pytorch version 0.2.0_4
This code is released under the MIT License (refer to the LICENSE file for details).
First, you must download the MiniImagenet dataset from here and set in dataloader.py the path to where the dataset resides in your machine. We recommend creating a dataset directory mkdir datasets
and placing the downloaded dataset there.
(1) In order to run the 1st training stage of our approach (which trains a recognition model with a cosine-similarity based classifier and a feature extractor with 128 feature channels on its last convolution layer) run the following command:
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128CosineClassifier
The above command launches the training routine using the configuration file ./config/miniImageNet_Conv128CosineClassifier.py
which is specified by the --config
argument (i.e., --config=miniImageNet_Conv128CosineClassifier
). Note that all the experiment configuration files are placed in the ./config directory.
(2) In order to run the 2nd training state of our approach (which trains the few-shot classification weight generator with attenition based weight inference) run the following commands:
# Training the model for the 1-shot case on the training set of MiniImagenet.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128CosineClassifierGenWeightAttN1
# Training the model for the 5-shot case on the training set of MiniImagenet.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128CosineClassifierGenWeightAttN5
(3) In order to evaluate the above models run the following commands:
# Evaluating the model for the 1-shot case on the test set of MiniImagenet.
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv128CosineClassifierGenWeightAttN1 --testset
# Evaluating the model for the 5-shot case on the test set of MiniImagenet.
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv128CosineClassifierGenWeightAttN5 --testset
(4) In order to train and evaluate our approach with different type of feature extractors (e.g., Conv32, Conv64, or ResNetLike; see our paper for a desciption of those feature extractors) run the following commands:
#************************** Feature extractor: Conv32 *****************************
# 1st training stage that trains the cosine-based recognition model.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv32CosineClassifier
# 2nd training stage that trains the few-shot weight generator for the 1-shot and 5-shot models.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv32CosineClassifierGenWeightAttN1
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv32CosineClassifierGenWeightAttN5
# Evaluate the 1-shot and 5-shot models.
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv32CosineClassifierGenWeightAttN1 --testset
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv32CosineClassifierGenWeightAttN5 --testset
#************************** Feature extractor: Conv64 *****************************
# 1st training stage that trains the cosine-based recognition model.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv64CosineClassifier
# 2nd training stage that trains the few-shot weight generator for the 1-shot and 5-shot models.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv64CosineClassifierGenWeightAttN1
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv64CosineClassifierGenWeightAttN5
# Evaluate the 1-shot and 5-shot models.
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv64CosineClassifierGenWeightAttN1 --testset
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv64CosineClassifierGenWeightAttN5 --testset
#************************** Feature extractor: ResNetLike *****************************
# 1st training stage that trains the cosine-based recognition model.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_ResNetLikeCosine
# 2nd training stage that trains the few-shot weight generator for the 1-shot and 5-shot models.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_ResNetLikeCosineClassifierGenWeightAttN1
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_ResNetLikeCosineClassifierGenWeightAttN5
# Evaluate the 1-shot and 5-shot models.
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_ResNetLikeCosineClassifierGenWeightAttN1 --testset
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_ResNetLikeCosineClassifierGenWeightAttN5 --testset
In order to train and evaluate our implementations of Matching Networks[3] and Prototypical Networks[4] run the following commands:
# Train and evaluate the matching networks model for the 1-shot case.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128MatchingNetworkN1
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv128MatchingNetworkN1 --testset
# Train and evaluate the matching networks model for the 5-shot case.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128MatchingNetworkN5
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv128MatchingNetworkN5 --testset
# Train and evaluate the prototypical networks model for the 1-shot case.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128PrototypicalNetworkN1
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv128PrototypicalNetworkN1 --testset
# Train and evaluate the prototypical networks model for the 5-shot case.
CUDA_VISIBLE_DEVICES=0 python train.py --config=miniImageNet_Conv128PrototypicalNetworkN5
CUDA_VISIBLE_DEVICES=0 python evaluate.py --config=miniImageNet_Conv128PrototypicalNetworkN5 --testset
Here we provide experimental results of our approach as well as of our implementations of Matching Networks and Prototypical Networks on the test of a Mini-ImageNet. Note that after cleaning and refactoring the implementation code of the paper and re-running the experiments, the results that we got are slightly different.
Approach | Feature extractor | Novel | Base | Both |
---|---|---|---|---|
Matching Networks [3] | Conv64 | 43.60% | - | - |
Prototypical Networks [4] | Conv64 | 49.42% +/- 0.78 | - | - |
Ravi and Laroche [5] | Conv32 | 43.40% +/- 0.77 | - | - |
Finn et al [6] | Conv64 | 48.70% +/- 1.84 | - | - |
Mishra et al [6] | ResNet | 55.71% +/- 0.99 | - | - |
Matching Networks (our implementation) | Conv64 | 53.65% +/- 0.80 | - | - |
Matching Networks (our implementation) | Conv128 | 54.32% +/- 0.80 | - | - |
Prototypical Networks (our implementation) | Conv64 | 53.30% +/- 0.79 | - | - |
Prototypical Networks (our implementation) | Conv128 | 54.14% +/- 0.80 | - | - |
Prototypical Networks (our implementation) | ResNet | 53.74% +/- 0.91 | - | - |
(Ours) Cosine & Att. Weight Gen. | Conv32 | 54.49% +/- 0.83 | 61.59% | 44.79% |
(Ours) Cosine & Att. Weight Gen. | Conv64 | 55.86% +/- 0.85 | 68.43% | 47.75% |
(Ours) Cosine & Att. Weight Gen. | Conv128 | 56.62% +/- 0.84 | 70.80% | 49.44% |
(Ours) Cosine & Att. Weight Gen. | ResNet | 56.21% +/- 0.83 | 79.79% | 52.81% |
Approach | Feature extractor | Novel | Base | Both |
---|---|---|---|---|
Matching Networks [3] | Conv64 | 55.30% | - | - |
Prototypical Networks [4] | Conv64 | 68.20% +/- 0.66 | - | - |
Ravi and Laroche [5] | Conv32 | 60.20% +/- 0.71 | - | - |
Finn et al [6] | Conv64 | 63.10% +/- 0.92 | - | - |
Mishra et al [6] | ResNet | 68.88% +/- 0.92 | - | - |
Matching Networks (our implementation) | Conv64 | 65.76% +/- 0.68 | - | - |
Matching Networks (our implementation) | Conv128 | 65.97% +/- 0.65 | - | - |
Prototypical Networks (our implementation) | Conv64 | 70.33% +/- 0.65 | - | - |
Prototypical Networks (our implementation) | Conv128 | 70.74% +/- 0.66 | - | - |
(Ours) Cosine & Att. Weight Gen. | Conv32 | 70.12% +/- 0.67 | 60.83% | 53.22% |
(Ours) Cosine & Att. Weight Gen. | Conv64 | 72.49% +/- 0.62 | 67.64% | 57.21% |
(Ours) Cosine & Att. Weight Gen. | Conv128 | 72.82% +/- 0.63 | 71.00% | 59.05% |
(Ours) Cosine & Att. Weight Gen. | ResNet | 70.64% +/- 0.66 | 79.56% | 59.48% |
Here provide instructions on how to train and evaluate our approach on the ImageNet based low-shot benchmark proposed by Bharath and Girshick [1].
(1) First, you must download the ImageNet dataset and set in dataloader.py the path to where the dataset resides in your machine. We recommend creating a dataset directory mkdir datasets
and placing the downloaded dataset there.
(2) Launch the 1st training stage of our approach by running the following command:
CUDA_VISIBLE_DEVICES=0 python lowshot_train_stage1.py --config=imagenet_ResNet10CosineClassifier
The above command will train the a recognition model with a ResNet10 feature extractor and a cosine similarity based classifier for 100 epochs (which will take around ~120 hours). You can download the already trained by us recognition model from here. In that case you should place the model inside the './experiments' directory with the name './experiments/imagenet_ResNet10CosineClassifier'.
(3) Extract and save the ResNet10 features (with the model that we trained above) from images of the ImageNet dataset:
# Extract features from the validation image split of the Imagenet.
CUDA_VISIBLE_DEVICES=0 python lowshot_save_features.py --config=imagenet_ResNet10CosineClassifier --split=val
# Extract features from the training image split of the Imagenet.
CUDA_VISIBLE_DEVICES=0 python lowshot_save_features.py --config=imagenet_ResNet10CosineClassifier --split=train
(4) Launch the 2st training stage of our approach (which trains the few-shot classification weight generator with attenition based weight inference) by running the following commands:
# Training the model for the 1-shot.
CUDA_VISIBLE_DEVICES=0 python lowshot_train_stage2.py --config=imagenet_ResNet10CosineClassifierWeightAttN1
# Training the model for the 2-shot.
CUDA_VISIBLE_DEVICES=0 python lowshot_train_stage2.py --config=imagenet_ResNet10CosineClassifierWeightAttN2
# Training the model for the 5-shot.
CUDA_VISIBLE_DEVICES=0 python lowshot_train_stage2.py --config=imagenet_ResNet10CosineClassifierWeightAttN5
# Training the model for the 10-shot.
CUDA_VISIBLE_DEVICES=0 python lowshot_train_stage2.py --config=imagenet_ResNet10CosineClassifierWeightAttN10
# Training the model for the 20-shot.
CUDA_VISIBLE_DEVICES=0 python lowshot_train_stage2.py --config=imagenet_ResNet10CosineClassifierWeightAttN20
(5) Evaluate the above trained models by running the following commands:
# Evaluate the model for the 1-shot the model.
CUDA_VISIBLE_DEVICES=0 python lowshot_evaluate.py --config=imagenet_ResNet10CosineClassifierWeightAttN1 --testset
# Evaluate the model for the 2-shot the model.
CUDA_VISIBLE_DEVICES=0 python lowshot_evaluate.py --config=imagenet_ResNet10CosineClassifierWeightAttN2 --testset
# Evaluate the model for the 5-shot the model.
CUDA_VISIBLE_DEVICES=0 python lowshot_evaluate.py --config=imagenet_ResNet10CosineClassifierWeightAttN5 --testset
# Evaluate the model for the 10-shot the model.
CUDA_VISIBLE_DEVICES=0 python lowshot_evaluate.py --config=imagenet_ResNet10CosineClassifierWeightAttN10 --testset
# Evaluate the model for the 20-shot the model.
CUDA_VISIBLE_DEVICES=0 python lowshot_evaluate.py --config=imagenet_ResNet10CosineClassifierWeightAttN20 --testset
Here we evaluate our approach on the ImageNet based low-shot benchmark proposed by Bharath and Girshick [1] using the improved evaluation metrics proposed by Wang et al [2]. All the approaches use a ResNet10 feature extractor. Note that after cleaning and refactoring the implementation code of the paper and re-running the experiments, the results that we got are slightly different. A pre-trained ResNet10 model with cosine-similarity based classifier is provided here: imagenet_ResNet10CosineClassifier
Approach | Novel | All | All with prior |
---|---|---|---|
Prototypical Networks (from[2]) | 39.30% | 49.50% | 53.60% |
Matching Networks (from[2]) | 43.60% | 54.40% | 54.50% |
Logistic regression (from[2]) | 38.40% | 40.80% | 52.90% |
Logistic regression w/ H (from[2]) | 40.70% | 52.20% | 53.20% |
Prototype Matching Nets [2] | 43.30% | 55.80% | 54.70% |
Prototype Matching Nets w/ H [2] | 45.80% | 57.60% | 56.40% |
(Ours) Cosine & Att. Weight Gen. | 46.26% +/- 0.20 | 58.29% +/- 0.13 | 56.88% +/- 0.13 |
Approach | Novel | All | All with prior |
---|---|---|---|
Prototypical Networks (from[2]) | 54.40% | 61.00% | 61.40% |
Matching Networks (from[2]) | 54.00% | 61.00% | 60.70% |
Logistic regression (from[2]) | 51.10% | 49.90% | 60.40% |
Logistic regression w/ H (from[2]) | 50.80% | 59.40% | 59.10% |
Prototype Matching Nets [2] | 55.70% | 63.10% | 62.00% |
Prototype Matching Nets w/ H [2] | 57.80% | 64.70% | 63.30% |
(Ours) Cosine & Att. Weight Gen. | 57.46% +/- 0.16 | 65.11% +/- 0.10 | 63.67% +/- 0.09 |
Approach | Novel | All | All with prior |
---|---|---|---|
Prototypical Networks (from[2]) | 66.30% | 69.70% | 68.80% |
Matching Networks (from[2]) | 66.00% | 69.00% | 68.20% |
Logistic regression (from[2]) | 64.80% | 64.20% | 68.60% |
Logistic regression w/ H (from[2]) | 62.00% | 67.60% | 66.80% |
Prototype Matching Nets [2] | 68.40% | 71.10% | 70.20% |
Prototype Matching Nets w/ H [2] | 69.00% | 71.90% | 70.60% |
(Ours) Cosine & Att. Weight Gen. | 69.27% +/- 0.09 | 72.70% +/- 0.06 | 71.24% +/- 0.06 |
Approach | Novel | All | All with prior |
---|---|---|---|
Prototypical Networks (from[2]) | 71.20% | 72.90% | 72.00% |
Matching Networks (from[2]) | 72.50% | 73.70% | 72.60% |
Logistic regression (from[2]) | 71.60% | 71.90% | 72.90% |
Logistic regression w/ H (from[2]) | 69.30% | 72.80% | 71.70% |
Prototype Matching Nets [2] | 74.00% | 75.00% | 73.90% |
Prototype Matching Nets w/ H [2] | 74.30% | 75.20% | 74.00% |
(Ours) Cosine & Att. Weight Gen. | 74.84% +/- 0.06 | 76.51% +/- 0.04 | 75.00% +/- 0.04 |
Approach | Novel | All | All with prior |
---|---|---|---|
Prototypical Networks (from[2]) | 73.90% | 74.60% | 73.80% |
Matching Networks (from[2]) | 76.90% | 76.50% | 75.60% |
Logistic regression (from[2]) | 76.60% | 76.90% | 76.30% |
Logistic regression w/ H (from[2]) | 76.50% | 76.90% | 76.30% |
Prototype Matching Nets [2] | 77.00% | 77.10% | 75.90% |
Prototype Matching Nets w/ H [2] | 77.40% | 77.50% | 76.20% |
(Ours) Cosine & Att. Weight Gen. | 78.11% +/- 0.05 | 78.74% +/- 0.03 | 77.28% +/- 0.03 |
[1] B. Hariharan and R. Girshick. Low-shot visual recognition by shrinking and hallucinating features.
[2] Y.-X. Wang and R. Girshick, M. Hebert, B. Hariharan. Low-shot learning from imaginary data.
[3] O. Vinyals et al. Matching networks for one shot learning.
[4] J. Snell, K. Swersky, and R. S. Zemel. Prototypical networks for few-shot learning.
[5] S. Ravi and H. Larochelle. Optimization as a model for few-shot learning.
[6] C. Finn, P. Abbeel, and S. Levine. Model-agnostic meta-learning for fast adaptation of deep networks.