OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

This repository is the official PyTorch implementation of OV-DQUO.

Overview

OV-DQUO is an open-vocabulary detection framework that learns from open-world unknown objects through wildcard matching and contrastive denoising training methods, mitigating performance degradation in novel category detection caused by confidence bias.

TODO

Release the Model Code
Release the Training and Evaluation Code
Updating the RoQIs selection Code

Environment

Linux with Python == 3.9.0
CUDA 11.7
The provided environment is suggested for reproducing our results, similar configurations may also work.

Quick Start

Create conda environment

conda create -n OV-DQUO python=3.9.0
conda activate OV-DQUO
pip install torch==2.0.0 torchvision==0.15.1

# other dependencies
pip install -r requirements.txt

# install detectron2
Please install detectron2==0.6 as instructed in the official tutorial
 (https://detectron2.readthedocs.io/en/latest/tutorials/install.html).

Install OpenCLIP

pip install -e . -v

Build for DeformableAttention

cd ./models/ops
sh ./make.sh

Download backbone weights

Download the ResNet CLIP pretrained region prompt weights for OV-COCO experiments from CORA , and place them in the pretrained directory.

Download the ViT-B/16 and ViT-L/14 pretrained weights for OV-LVIS experiments from CLIPself, and place them in the pretrained directory.

Download text embedding & precomputed wildcard embeddings(optional)

For the OV-LVIS experiment, you need to download the category name list and pre-computed text embedding and wildcard embedding from this Link. Similarly, placing them in the pretrained directory.

Prepare the datasets

Please download the COCO dataset, unzip it, place them in the data directory, and make sure it is in the following structure:

data/
  Annotations/
    instances_train2017.json
    instances_val2017.json
  Images/
    train2017
    val2017

Please download the OV-COCO and OV-LVIS dataset annotations, and place them in the data/Annotations folder.

Prepare the open-world unknwon objects

Download the open-world pseudo labels and place them in the ow_labels folder.

Script for training OV-DQUO

To train the OV-DQUO on the OV-COCO dataset, please run the following script:

# dist training based on RN50 backbone, 8 GPU
bash scripts/OV-COCO/distrain_RN50.sh logs/r50_ovcoco

# dist training based on RN50x4 backbone, 8 GPU
bash scripts/OV-COCO/distrain_RN50x4.sh logs/r50x4_ovcoco

To train the OV-DQUO on the OV-LVIS dataset, please run the following script:

# dist training based on ViT-B/16 backbone, 8 GPU
bash scripts/OV-LVIS/distrain_ViTB16.sh logs/vitb_ovlvis
# dist training based on ViT-L/14 backbone, 8 GPU
bash scripts/OV-LVIS/distrain_ViTL14.sh logs/vitl_ovlvis

Our code can also run on a single GPU. You can find the corresponding run script in the script folder. However, we have not tested it due to the long training time.

Since the evaluation process of OV-LVIS is very time-consuming and thus significantly prolongs the training time, we adopted an offline evaluation method. After training, please run the following script to evaluate the results of each epoch:

# offline evaluation
python custom_tools/offline_lvis_eval.py -f logs/vitl_ovlvis -n 15 34 -c config/OV_LVIS/OVDQUO_ViTL14.py

Results & Checkpoints

OV-COCO

Model name	AP50_Novel	Checkpoint
OVDQUO_RN50_COCO	39.2	model
OVDQUO_RN50x4_COCO	45.6	model

OV-LVIS

Model name	mAP_rare	Checkpoint
OVDQUO_ViT-B/16_LVIS	29.7	model
OVDQUO_ViT-L/14_LVIS	39.3	model

Evaluation

To evaluate our pretrained checkpoint on the OV-COCO dataset, please download the checkpoints from above links, place them in the ckpt folder, and run:

# R50
bash scripts\OV-COCO\diseval_RN50.sh logs/r50_ovcoco_eval
# R50x4
bash scripts\OV-COCO\diseval_RN50x4.sh logs/r50x4_ovcoco_eval

To evaluate our pretrained checkpoint on the OV-LVIS dataset, please download the checkpoint from above links, place them in the ckpt folder, and run:

# vit-b
bash scripts\OV-LVIS\diseval_ViTB16.sh logs/vitb_ovlvis_eval
# vit-l
bash scripts\OV-LVIS\diseval_ViTL14.sh logs/vitl_ovlvis_eval

Citation and Acknowledgement

Citation

If you find this repo useful, please consider citing our paper:

@misc{wang2024ovdquo,
      title={OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision}, 
      author={Junjie Wang and Bin Chen and Bin Kang and Yulin Li and YiChi Chen and Weizhi Xian and Huifeng Chang},
      year={2024},
      eprint={2405.17913},
      archivePrefix={arXiv},
      primaryClass={cs.CV}}

Acknowledgement

This repository was built on top of DINO, CORA, MEPU, and CLIPself. We thank the effort from the community.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
config		config
custom_tools		custom_tools
datasets		datasets
docs		docs
models		models
scripts		scripts
src/open_clip		src/open_clip
util		util
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
engine.py		engine.py
main.py		main.py
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

Overview

TODO

Environment

Quick Start

Create conda environment

Install OpenCLIP

Build for DeformableAttention

Download backbone weights

Download text embedding & precomputed wildcard embeddings(optional)

Prepare the datasets

Prepare the open-world unknwon objects

Script for training OV-DQUO

Results & Checkpoints

OV-COCO

OV-LVIS

Evaluation

Citation and Acknowledgement

Citation

Acknowledgement

About

Releases

Packages

Languages

License

xiaomoguhz/OV-DQUO

Folders and files

Latest commit

History

Repository files navigation

OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision

Overview

TODO

Environment

Quick Start

Create conda environment

Install OpenCLIP

Build for DeformableAttention

Download backbone weights

Download text embedding & precomputed wildcard embeddings(optional)

Prepare the datasets

Prepare the open-world unknwon objects

Script for training OV-DQUO

Results & Checkpoints

OV-COCO

OV-LVIS

Evaluation

Citation and Acknowledgement

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages