Skip to content

xiaomoguhz/OV-DQUO

Repository files navigation

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

This repository is the official PyTorch implementation of OV-DQUO.

Alt text

Overview

OV-DQUO is an open-vocabulary detection framework that learns from open-world unknown objects through wildcard matching and contrastive denoising training methods, mitigating performance degradation in novel category detection caused by confidence bias.

TODO

  • Release the Model Code
  • Release the Training and Evaluation Code
  • Updating the RoQIs selection Code

Environment

  • Linux with Python == 3.9.0
  • CUDA 11.7
  • The provided environment is suggested for reproducing our results, similar configurations may also work.

Quick Start

Create conda environment

conda create -n OV-DQUO python=3.9.0
conda activate OV-DQUO
pip install torch==2.0.0 torchvision==0.15.1

# other dependencies
pip install -r requirements.txt

# install detectron2
Please install detectron2==0.6 as instructed in the official tutorial
 (https://detectron2.readthedocs.io/en/latest/tutorials/install.html). 

Install OpenCLIP

pip install -e . -v

Build for DeformableAttention

cd ./models/ops
sh ./make.sh

Download backbone weights

Download the ResNet CLIP pretrained region prompt weights for OV-COCO experiments from CORA , and place them in the pretrained directory.

Download the ViT-B/16 and ViT-L/14 pretrained weights for OV-LVIS experiments from CLIPself, and place them in the pretrained directory.

Download text embedding & precomputed wildcard embeddings(optional)

For the OV-LVIS experiment, you need to download the category name list and pre-computed text embedding and wildcard embedding from this Link. Similarly, placing them in the pretrained directory.

Prepare the datasets

Please download the COCO dataset, unzip it, place them in the data directory, and make sure it is in the following structure:

data/
  Annotations/
    instances_train2017.json
    instances_val2017.json
  Images/
    train2017
    val2017

Please download the OV-COCO and OV-LVIS dataset annotations, and place them in the data/Annotations folder.

Prepare the open-world unknwon objects

Download the open-world pseudo labels and place them in the ow_labels folder.

Script for training OV-DQUO

To train the OV-DQUO on the OV-COCO dataset, please run the following script:

# dist training based on RN50 backbone, 8 GPU
bash scripts/OV-COCO/distrain_RN50.sh logs/r50_ovcoco

# dist training based on RN50x4 backbone, 8 GPU
bash scripts/OV-COCO/distrain_RN50x4.sh logs/r50x4_ovcoco

To train the OV-DQUO on the OV-LVIS dataset, please run the following script:

# dist training based on ViT-B/16 backbone, 8 GPU
bash scripts/OV-LVIS/distrain_ViTB16.sh logs/vitb_ovlvis
# dist training based on ViT-L/14 backbone, 8 GPU
bash scripts/OV-LVIS/distrain_ViTL14.sh logs/vitl_ovlvis

Our code can also run on a single GPU. You can find the corresponding run script in the script folder. However, we have not tested it due to the long training time.

Since the evaluation process of OV-LVIS is very time-consuming and thus significantly prolongs the training time, we adopted an offline evaluation method. After training, please run the following script to evaluate the results of each epoch:

# offline evaluation
python custom_tools/offline_lvis_eval.py -f logs/vitl_ovlvis -n 15 34 -c config/OV_LVIS/OVDQUO_ViTL14.py

Results & Checkpoints

OV-COCO

Model name AP50_Novel Checkpoint
OVDQUO_RN50_COCO 39.2 model
OVDQUO_RN50x4_COCO 45.6 model

OV-LVIS

Model name mAP_rare Checkpoint
OVDQUO_ViT-B/16_LVIS 29.7 model
OVDQUO_ViT-L/14_LVIS 39.3 model

Evaluation

To evaluate our pretrained checkpoint on the OV-COCO dataset, please download the checkpoints from above links, place them in the ckpt folder, and run:

# R50
bash scripts\OV-COCO\diseval_RN50.sh logs/r50_ovcoco_eval
# R50x4
bash scripts\OV-COCO\diseval_RN50x4.sh logs/r50x4_ovcoco_eval

To evaluate our pretrained checkpoint on the OV-LVIS dataset, please download the checkpoint from above links, place them in the ckpt folder, and run:

# vit-b
bash scripts\OV-LVIS\diseval_ViTB16.sh logs/vitb_ovlvis_eval
# vit-l
bash scripts\OV-LVIS\diseval_ViTL14.sh logs/vitl_ovlvis_eval

Citation and Acknowledgement

Citation

If you find this repo useful, please consider citing our paper:

@misc{wang2024ovdquo,
      title={OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision}, 
      author={Junjie Wang and Bin Chen and Bin Kang and Yulin Li and YiChi Chen and Weizhi Xian and Huifeng Chang},
      year={2024},
      eprint={2405.17913},
      archivePrefix={arXiv},
      primaryClass={cs.CV}}

Acknowledgement

This repository was built on top of DINO, CORA, MEPU, and CLIPself. We thank the effort from the community.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages