Skip to content

[T-PAMI 2022] Meta-DETR for Few-Shot Object Detection: Official PyTorch Implementation

License

Notifications You must be signed in to change notification settings

ZhangGongjie/Meta-DETR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

[T-PAMI' 2022] Meta-DETR
(Official PyTorch Implementation)

arXiv Survey Maintenance PR's Welcome GitHub license


This repository is the official PyTorch implementation of the T-PAMI 2022 paper "Meta-DETR: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation" by Gongjie Zhang, Zhipeng Luo, Kaiwen Cui, Shijian Lu, and Eric P. Xing.

[ Important Notice ]    Meta-DETR first appeared as a tech report on arXiv.org (https://arxiv.org/abs/2103.11731v2) in 2021. Since its release, we have made substantial improvements to the original version. This repository corresponds to the final published version accepted by IEEE Transactions on Pattern Analysis and Machine Intelligence (T-PAMI) in 2022. Please kindly be advised to refer to the latest version of the paper.


 

Brief Introduction

Meta-DETR is a state-of-the-art few-shot object detector that performs image-level meta-learning-based prediction and effectively exploits the inter-class correlation to enhance generalization from old knowledge to new classes. Meta-DETR entirely bypasses the proposal quality gap between base and novel classes, thus achieving superior performance than R-CNN-based few-shot object detectors. In addition, Meta-DETR performs meta-learning on a set of support classes at one go, thus effectively leveraging the inter-class correlation for better generalization.

Please check our T-PAMI paper or its preprint version for more details.


 

Installation

Pre-Requisites

You must have NVIDIA GPUs to run the codes.

The implementation codes are developed and tested with the following environment setups:

  • Ubuntu LTS 18.04
  • 8x NVIDIA V100 GPUs (32GB)
  • CUDA 10.2
  • Python == 3.7
  • PyTorch == 1.7.1+cu102, TorchVision == 0.8.2+cu102
  • GCC == 7.5.0
  • cython, pycocotools, tqdm, scipy

We recommend using the exact setups above. However, other environments (Linux, Python>=3.7, CUDA>=9.2, GCC>=5.4, PyTorch>=1.5.1, TorchVision>=0.6.1) should also work properly.

 

Code Installation

First, clone the repository locally:

git clone https://github.com/ZhangGongjie/Meta-DETR.git

We recommend you to use Anaconda to create a conda environment:

conda create -n meta_detr python=3.7 pip

Then, activate the environment:

conda activate meta_detr

Then, install PyTorch and TorchVision:

(preferably using our recommended setups; CUDA version should match your own local environment)

conda install pytorch=1.7.1 torchvision=0.8.2 cudatoolkit=10.2 -c pytorch

After that, install other requirements:

conda install cython scipy tqdm
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'

As Meta-DETR is developed upon Deformable DETR, you need to compile Deformable Attention.

# compile CUDA operators of Deformable Attention
cd Meta-DETR
cd ./models/ops
sh ./make.sh
python test.py  # unit test (should see all checking is True)

 

Data Preparation

MS-COCO for Few-Shot Object Detection

Please download COCO 2017 dataset and organize them as following:

code_root/
└── data/
    ├── coco_fewshot/        # Few-shot dataset 
    └── coco/                # MS-COCO dataset
        ├── train2017/
        ├── val2017/
        └── annotations/
            ├── instances_train2017.json
            └── instances_val2017.json

The coco_fewshot folder (already provided in this repo) contains randomly sampled few-shot datasets as described in the paper, including the five data setups with different random seeds. In each K-shot (K=1,3,5,10,30) data setup, we ensure that there are exactly K object instances for each novel class. The numbers of base-class object instances vary.

Pascal VOC for Few-Shot Object Detection

We transform the original Pascal VOC dataset format into MS-COCO format for parsing. The transformed Pascal VOC dataset is available for download at GoogleDrive.

After downloading MS-COCO-style Pascal VOC, please organize them as following:

code_root/
└── data/
    ├── voc_fewshot_split1/     # VOC Few-shot dataset
    ├── voc_fewshot_split2/     # VOC Few-shot dataset
    ├── voc_fewshot_split3/     # VOC Few-shot dataset
    └── voc/                    # MS-COCO-Style Pascal VOC dataset
        ├── images/
        └── annotations/
            ├── xxxxx.json
            ├── yyyyy.json
            └── zzzzz.json

Similarly, the few-shot datasets for Pascal VOC are also provided in this repo (voc_fewshot_split1, voc_fewshot_split2, and voc_fewshot_split3). For each class split, there are 10 data setups with different random seeds. In each K-shot (K=1,2,3,5,10) data setup, we ensure that there are exactly K object instances for each novel class. The numbers of base-class object instances vary.


 

Usage

Reproducing Paper Results

All scripts to reproduce results reported in our T-PAMI paper are stored in ./scripts. The arguments are pretty easy and straightforward to understand.

Taking MS-COCO as an example, run the following commands to reproduce paper results:

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8 ./scripts/run_experiments_coco.sh

To Perform Base Training

We take MS-COCO as an example. First, create basetrain.sh and copy the following commands into it.

EXP_DIR=exps/coco
BASE_TRAIN_DIR=${EXP_DIR}/base_train
mkdir exps
mkdir ${EXP_DIR}
mkdir ${BASE_TRAIN_DIR}

python -u main.py \
    --dataset_file coco_base \
    --backbone resnet101 \
    --num_feature_levels 1 \
    --enc_layers 6 \
    --dec_layers 6 \
    --hidden_dim 256 \
    --num_queries 300 \
    --batch_size 4 \
    --category_codes_cls_loss \
    --epoch 25 \
    --lr_drop_milestones 20 \
    --save_every_epoch 5 \
    --eval_every_epoch 5 \
    --output_dir ${BASE_TRAIN_DIR} \
2>&1 | tee ${BASE_TRAIN_DIR}/log.txt

Then, run the commands below to start base training.

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8  ./basetrain.sh

To Perform Few-Shot Finetuning

We take MS-COCO as an example. First, create fsfinetune.sh and copy the following commands into it.

EXP_DIR=exps/coco
BASE_TRAIN_DIR=${EXP_DIR}/base_train
mkdir exps
mkdir ${EXP_DIR}
mkdir ${BASE_TRAIN_DIR}

fewshot_seed=01
num_shot=10
epoch=500
lr_drop1=300
lr_drop2=450
FS_FT_DIR=${EXP_DIR}/seed${fewshot_seed}_${num_shot}shot
mkdir ${FS_FT_DIR}

python -u main.py \
    --dataset_file coco_base \
    --backbone resnet101 \
    --num_feature_levels 1 \
    --enc_layers 6 \
    --dec_layers 6 \
    --hidden_dim 256 \
    --num_queries 300 \
    --batch_size 2 \
    --category_codes_cls_loss \
    --resume ${BASE_TRAIN_DIR}/checkpoint.pth \
    --fewshot_finetune \
    --fewshot_seed ${fewshot_seed} \
    --num_shots ${num_shot} \
    --epoch ${epoch} \
    --lr_drop_milestones ${lr_drop1} ${lr_drop2} \
    --warmup_epochs 50 \
    --save_every_epoch ${epoch} \
    --eval_every_epoch ${epoch} \
    --output_dir ${FS_FT_DIR} \
2>&1 | tee ${FS_FT_DIR}/log.txt

Note that you need to add --fewshot_finetune to indicate that the training and inference should be conducted on few-shot setups. You also need to specify the number of shots, few-shot random seed, training epoch setups, and the checkpoint file path after base training. Then, run the commands below to start few-shot finetuning. After finetuning, the program will automatically perform inference on novel classes.

GPUS_PER_NODE=8 ./tools/run_dist_launch.sh 8  ./fsfinetune.sh

To Perform Only Inference After Few-Shot Finetuning

We take MS-COCO as an example. Simply run:

python -u main.py \
--dataset_file coco_base \
--backbone resnet101 \
--num_feature_levels 1 \
--enc_layers 6 \
--dec_layers 6 \
--hidden_dim 256 \
--num_queries 300 \
--batch_size 2 \
--category_codes_cls_loss \
--resume path/to/checkpoint.pth/generated/by/few-shot-fintuning \
--fewshot_finetune \
--fewshot_seed ${fewshot_seed} \
--num_shots ${num_shot} \
--eval \
2>&1 | tee ./log_inference.txt

Note that user should set --eval and --resume path/to/checkpoint.pth/generated/by/few-shot-fintuning correctly.


 

Pre-Trained Model Weights

We provide trained model weights after the base training stage for users to finetune.

All pre-trained model weights are stored in Google Drive.

  • MS-COCO after base training:   click here to download.

  • Pascal VOC Split 1 after base training:   click here to download.

  • Pascal VOC Split 2 after base training:   click here to download.

  • Pascal VOC Split 3 after base training:   click here to download.


 

License

The implementation codes of Meta-DETR are released under the MIT license.

Please see the LICENSE file for more information.

However, prior works' licenses also apply. It is the users' responsibility to ensure compliance with all license requirements.


 

Citation

If you find Meta-DETR useful or inspiring, please consider citing:

@article{Meta-DETR-2022,
  author={Zhang, Gongjie and Luo, Zhipeng and Cui, Kaiwen and Lu, Shijian and Xing, Eric P.},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={{Meta-DETR}: Image-Level Few-Shot Detection with Inter-Class Correlation Exploitation}, 
  year={2022},
  doi={10.1109/TPAMI.2022.3195735},
}

 

Acknowledgement

Our proposed Meta-DETR is heavily inspired by many outstanding prior works, including DETR and Deformable DETR. Thank the authors of above projects for open-sourcing their implementation codes!

About

[T-PAMI 2022] Meta-DETR for Few-Shot Object Detection: Official PyTorch Implementation

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published