Figure 1: Our proposed Resampling at image-level and obect-level (RIO).
Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection.
Nadine Chang, Zhiding Yu, Yu-Xiong Wang, Anima Anandkumar, Sanja Fidler, Jose M. Alvarez.
ICML 2021.
This repository contains the official Pytorch implementation of training & evaluation code and the pretrained models for RIO.
Training on datasets with long-tailed distributions has been challenging for major recognition tasks such as classification and detection. To deal with this challenge, image resampling is typically introduced as a simple but effective approach. However, we observe that long-tailed detection differs from classification since multiple classes may be present in one image. As a result, image resampling alone is not enough to yield a sufficiently balanced distribution at the object level. We address object-level resampling by introducing an object-centric memory replay strategy based on dynamic, episodic memory banks. Our proposed strategy has two benefits: 1) convenient object-level resampling without significant extra computation, and 2) implicit feature-level augmentation from model updates. We show that image-level and object-level resamplings are both important, and thus unify them with a joint resampling strategy (RIO). Our method outperforms state-of-the-art long-tailed detection and segmentation methods on LVIS v0.5 across various backbones.
- Linux or maxOS with Python >= 3.6
- PyTorch >= 1.5 and torchvision corresponding to PyTorch installation. Please refer to download guildlines at the PyTorch website
- Detectron2
- OpenCV is optional but required for visualizations
Please refer to the installation instructions in Detectron2.
We use Detectron2 v0.3 as the codebase. Thus, we advise installing Detectron2 from a clone of this repository.
Dataset download is available at the official LVIS website. Please follow Detectron's guildlines on expected LVIS dataset structure.
- Python 3.6.9
- PyTorch 1.5.0 with CUDA 10.2
- Detectron2 built from this repository.
Detection and Instance Segmentation on LVIS v0.5
Backbone | Method | AP.b | AP.b.r | AP.b.c | AP.b.f | AP.m | AP.m.r | AP.m.c | AP.m.f | download |
---|---|---|---|---|---|---|---|---|---|---|
R50-FPN | MaskRCNN-RIO | 25.7 | 17.2 | 25.1 | 29.8 | 26.0 | 18.9 | 26.2 | 28.5 | model |
R101-FPN | MaskRCNN-RIO | 27.3 | 19.1 | 26.8 | 31.2 | 27.7 | 20.1 | 28.3 | 30.0 | model |
X101-FPN | MaskRCNN-RIO | 28.6 | 19.0 | 28.0 | 33.0 | 28.9 | 19.5 | 29.7 | 31.6 | model |
Our code is located under projects/RIO.
Our training and evaluation follows those of Detectron2's. We've provided config files for both LVISv0.5 and LVISv1.0.
Example: Training LVISv0.5 on Mask-RCNN ResNet-50
# We advise multi-gpu training
cd projects/RIO
python memory_train_net.py \
--num-gpus 4 \
--config-file=configs/LVISv0.5-InstanceSegmentation/memory_mask_rcnn_R_50_FPN_1x.yaml
Example: Evaluating LVISv0.5 on Mask-RCNN ResNet-50
cd projects/RIO
python memory_train_net.py \
--eval-only MODEL.WEIGHTS /path/to/model_checkpoint \
--config-file configs/LVISv0.5-InstanceSegmentation/memory_mask_rcnn_R_50_FPN_1x.yaml
By default, LVIS evaluation follows immediately after training.
Detectron2 has built-in visualization tools. Under tools folder, visualize_json_results.py can be used to visualize the json instance detection/segmentation results given by LVISEvaluator.
python visualize_json_results.py --input x.json --output dir/ --dataset lvis
Further information can be found on Detectron2 tools' README.
Please check the LICENSE file. RIO may be used non-commercially, meaning for research or evaluation purposes only. For business inquiries, please contact researchinquiries@nvidia.com.
@article{chang2021image,
title={Image-Level or Object-Level? A Tale of Two Resampling Strategies for Long-Tailed Detection},
author={Chang, Nadine and Yu, Zhiding and Wang, Yu-Xiong and Anandkumar, Anima and Fidler, Sanja and Alvarez, Jose M},
journal={arXiv preprint arXiv:2104.05702},
year={2021}
}