OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
This repository is the official PyTorch implementation of OV-DQUO.
OV-DQUO is an open-vocabulary detection framework that learns from open-world unknown objects through wildcard matching and contrastive denoising training methods, mitigating performance degradation in novel category detection caused by confidence bias.
- Release the Model Code
- Release the Training and Evaluation Code
- Updating the RoQIs selection Code
- Linux with Python == 3.9.0
- CUDA 11.7
- The provided environment is suggested for reproducing our results, similar configurations may also work.
conda create -n OV-DQUO python=3.9.0
conda activate OV-DQUO
pip install torch==2.0.0 torchvision==0.15.1
# other dependencies
pip install -r requirements.txt
# install detectron2
Please install detectron2==0.6 as instructed in the official tutorial
(https://detectron2.readthedocs.io/en/latest/tutorials/install.html).
pip install -e . -v
cd ./models/ops
sh ./make.sh
Download the ResNet CLIP pretrained region prompt weights for OV-COCO experiments from CORA , and place them in the pretrained
directory.
Download the ViT-B/16 and ViT-L/14 pretrained weights for OV-LVIS experiments from CLIPself, and place them in the pretrained
directory.
For the OV-LVIS experiment, you need to download the category name list and pre-computed text embedding and wildcard embedding from this Link. Similarly, placing them in the pretrained
directory.
Please download the COCO dataset, unzip it, place them in the data
directory, and make sure it is in the following structure:
data/
Annotations/
instances_train2017.json
instances_val2017.json
Images/
train2017
val2017
Please download the OV-COCO and OV-LVIS dataset annotations, and place them in the data/Annotations
folder.
Download the open-world pseudo labels and place them in the ow_labels
folder.
To train the OV-DQUO on the OV-COCO dataset, please run the following script:
# dist training based on RN50 backbone, 8 GPU
bash scripts/OV-COCO/distrain_RN50.sh logs/r50_ovcoco
# dist training based on RN50x4 backbone, 8 GPU
bash scripts/OV-COCO/distrain_RN50x4.sh logs/r50x4_ovcoco
To train the OV-DQUO on the OV-LVIS dataset, please run the following script:
# dist training based on ViT-B/16 backbone, 8 GPU
bash scripts/OV-LVIS/distrain_ViTB16.sh logs/vitb_ovlvis
# dist training based on ViT-L/14 backbone, 8 GPU
bash scripts/OV-LVIS/distrain_ViTL14.sh logs/vitl_ovlvis
Our code can also run on a single GPU. You can find the corresponding run script in the script
folder. However, we have not tested it due to the long training time.
Since the evaluation process of OV-LVIS is very time-consuming and thus significantly prolongs the training time, we adopted an offline evaluation method. After training, please run the following script to evaluate the results of each epoch:
# offline evaluation
python custom_tools/offline_lvis_eval.py -f logs/vitl_ovlvis -n 15 34 -c config/OV_LVIS/OVDQUO_ViTL14.py
Model name | AP50_Novel | Checkpoint |
---|---|---|
OVDQUO_RN50_COCO | 39.2 | model |
OVDQUO_RN50x4_COCO | 45.6 | model |
Model name | mAP_rare | Checkpoint |
---|---|---|
OVDQUO_ViT-B/16_LVIS | 29.7 | model |
OVDQUO_ViT-L/14_LVIS | 39.3 | model |
To evaluate our pretrained checkpoint on the OV-COCO dataset, please download the checkpoints from above links, place them in the ckpt
folder, and run:
# R50
bash scripts\OV-COCO\diseval_RN50.sh logs/r50_ovcoco_eval
# R50x4
bash scripts\OV-COCO\diseval_RN50x4.sh logs/r50x4_ovcoco_eval
To evaluate our pretrained checkpoint on the OV-LVIS dataset, please download the checkpoint from above links, place them in the ckpt
folder, and run:
# vit-b
bash scripts\OV-LVIS\diseval_ViTB16.sh logs/vitb_ovlvis_eval
# vit-l
bash scripts\OV-LVIS\diseval_ViTL14.sh logs/vitl_ovlvis_eval
If you find this repo useful, please consider citing our paper:
@misc{wang2024ovdquo,
title={OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision},
author={Junjie Wang and Bin Chen and Bin Kang and Yulin Li and YiChi Chen and Weizhi Xian and Huifeng Chang},
year={2024},
eprint={2405.17913},
archivePrefix={arXiv},
primaryClass={cs.CV}}
This repository was built on top of DINO, CORA, MEPU, and CLIPself. We thank the effort from the community.