Skip to content

YOLO-UniOW: Efficient Universal Open-World Object Detection

License

Notifications You must be signed in to change notification settings

THU-MIG/YOLO-UniOW

Repository files navigation

YOLO-UniOW: Efficient Universal Open-World Object Detection

The official implementation of YOLO-UniOW [arxiv]

yolo-uniow

Zero-shot Performance on LVIS Dataset

YOLO-UniOW-S/M/L has been pre-trained from scratch and evaluated on the LVIS minival. The pre-trained weights can be downloaded from the link provided below.

Model #Params APmini APr APc APf FPS (V100)
YOLO-UniOW-S 7.5M 26.2 24.1 24.9 27.7 98.3
YOLO-UniOW-M 16.2M 31.8 26.0 30.5 34 86.2
YOLO-UniOW-L 29.4M 34.6 30.0 33.6 36.3 64.8

Experiment Setup

Data Preparation

For preparing open-vocabulary and open-world data, please refer to docs/data.

Installation

Our model is built with CUDA 11.8 and PyTorch 2.1.2. To set up the environment, refer to the PyTorch official documentation for installation guidance. For detailed instructions on installing mmcv, please see docs/installation.

conda create -n yolouniow python=3.9
conda activate yolouniow
pip install torch==2.1.2 torchvision==0.16.2 --index-url https://download.pytorch.org/whl/cu118
pip install mmcv==2.1.0 -f https://download.openmmlab.com/mmcv/dist/cu118/torch2.1/index.html
pip install -r requirements.txt
pip install -e .

Training & Evaluation

For open-vocabulary model training and evaluation, please refer to run_ovod.sh

# Train Open-Vocabulary Model
./tools/dist_train.sh configs/pretrain/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.py 8 --amp

# Evaluate Open-Vocabulary Model
./tools/dist_test.sh configs/pretrain/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.py \
    pretrained/yolo_uniow_s_lora_bn_5e-4_100e_8gpus_obj365v1_goldg_train_lvis_minival.pth 8

For open-world model training and evaluation, please follow the steps provided in run_owod.sh. Ensure that the model is trained before proceeding with the evaluation. We provide our fine-tuned wildcard features, object_tuned_s and object_tuned_m, obtained through steps 2 and 3, allowing for direct use.

# 1. Extract text/wildcard features from pretrained model
python tools/owod_scripts/extract_text_feats.py --config $CONFIG --ckpt $CHECKPOINT --save_path $EMBEDS_PATH

# 2. Fine-tune wildcard features
./tools/dist_train.sh $OBJ_CONFIG 8 --amp

# 3. Extract fine-tuned wildcard features
python tools/owod_scripts/extract_text_feats.py --config $OBJ_CONFIG --save_path $EMBEDS_PATH --extract_tuned

# 4. Train all owod tasks
python tools/owod_scripts/train_owod_tasks.py MOWODB $OW_CONFIG $CHECKPOINT

# 5. Evaluate all owod tasks
python tools/owod_scripts/test_owod_tasks.py MOWODB $OW_CONFIG --save

To train and evaluate on specific datasets and tasks, use the commands below:

# Train owod task
DATASET=$DATASET TASK=$TASK THRESHOLD=$THRESHOLD SAVE=$SAVE \
./tools/dist_train_owod.sh $CONFIG 8 --amp

# Evaluate owod task
DATASET=$DATASET TASK=$TASK THRESHOLD=$THRESHOLD SAVE=$SAVE \
./tools/dist_test.sh $CONFIG $CHECKPOINT 8

Acknowledgement

This project builds upon YOLO-World, YOLOv10, FOMO, and OVOW. We sincerely thank the authors for their excellent implementations!

Citation

If our code or models help your work, please cite our paper and yolov10:

@article{liu2024yolouniow,
  title={YOLO-UniOW: Efficient Universal Open-World Object Detection},
  author={Liu, Lihao and Feng, Juexiao and Chen, Hui and Wang, Ao and Song, Lin and Han, Jungong and Ding, Guiguang},
  journal={arXiv preprint arXiv:2412.20645},
  year={2024}
}

@article{wang2024yolov10,
  title={YOLOv10: Real-Time End-to-End Object Detection},
  author={Wang, Ao and Chen, Hui and Liu, Lihao and Chen, Kai and Lin, Zijia and Han, Jungong and Ding, Guiguang},
  journal={arXiv preprint arXiv:2405.14458},
  year={2024}
}

About

YOLO-UniOW: Efficient Universal Open-World Object Detection

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published