Official implementation of online self-training and a split-and-fusion (SAF) head for Open-Vocabulary Object Detection (OVD), SAS-Det for short. This project was named as Improving Pseudo Labels for Open-Vocabulary Object Detection.
- Our project is developed on Detectron2. Please follow the official installation instructions, OR the following instructions.
# create new environment
conda create -n sas_det python=3.8
conda activate sas_det
# install pytorch
conda install pytorch==1.12.0 torchvision==0.13.0 torchaudio==0.12.0 cudatoolkit=11.3 -c pytorch
# install Detectron2 from a local clone
git clone https://github.com/facebookresearch/detectron2.git
python -m pip install -e detectron2
- Install CLIP
# install CLIP
pip install scipy
pip install ftfy regex tqdm
pip install git+https://github.com/openai/CLIP.git
-
Please follow RegionCLIP's dataset instructions to prepare COCO and LVIS datasets.
-
Download and put metadata for datasets in the folder
datasets
(i.e.,$DETECTRON2_DATASETS
used in the last step), which will be used in our evaluation and training.
-
Download various RegionCLIP's pretrained weights. Check here for more details. Create a new folder
pretrained_ckpt
to put those weights. In this repository,regionclip
,concept_emb
andrpn
will be used. -
Download our pretrained weights and put them in corresponding folders in
pretrained_ckpt
. Our pretrained weights includes:r50_3x_pre_RegCLIP_cocoRPN_2
: RPN weights pretrained only with COCO Base categories. This is used for experiments on COCO to avoid potential data leakage.concept_emb
: Complementary to RegionCLIP'sconcept_emb
.
Configs | Novel AP | Base AP | Overall AP |
---|---|---|---|
w/o SAF head | 31.4 | 55.7 | 49.4 |
with SAF head | 37.4 | 58.5 | 53.0 |
Evaluation without the SAF Head (baseline in the paper),
python3 ./test_net.py \
--num-gpus 8 \
--eval-only \
--config-file ./sas_det/configs/regionclip/COCO-InstanceSegmentation/customized/CLIP_fast_rcnn_R_50_C4_ovd_PLs.yaml \
MODEL.WEIGHTS ./pretrained_ckpt/sas_det/sas_det_coco_no_saf_head_baseline.pth \
MODEL.CLIP.OFFLINE_RPN_CONFIG ./sas_det/configs/regionclip/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \
MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \
MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \
MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \
MODEL.ROI_HEADS.SOFT_NMS_ENABLED True \
OUTPUT_DIR output/eval
Evaluation with the SAF Head,
python3 ./test_net.py \
--num-gpus 8 \
--eval-only \
--config-file ./sas_det/configs/ovd_coco_R50_C4_ensemble_PLs.yaml \
MODEL.WEIGHTS ./pretrained_ckpt/sas_det/sas_det_coco.pth \
MODEL.CLIP.OFFLINE_RPN_CONFIG ./sas_det/configs/regionclip/COCO-InstanceSegmentation/mask_rcnn_R_50_C4_1x_ovd_FSD.yaml \
MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_coco_48.pth \
MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_48_base_cls_emb.pth \
MODEL.CLIP.CONCEPT_POOL_EMB ./pretrained_ckpt/concept_emb/my_coco_48_base_17_cls_emb.pth \
MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/coco_65_cls_emb.pth \
MODEL.ROI_HEADS.SOFT_NMS_ENABLED True \
MODEL.ENSEMBLE.TEST_CATEGORY_INFO "./datasets/coco_ovd_continue_cat_ids.json" \
MODEL.ENSEMBLE.ALPHA 0.3 MODEL.ENSEMBLE.BETA 0.7 \
OUTPUT_DIR output/eval
Configs | APr | APc | APf | AP |
---|---|---|---|---|
RN50-C4 as backbone | 20.1 | 27.1 | 32.9 | 28.1 |
RN50x4-C4 as backbone | 29.0 | 32.3 | 36.8 | 33.5 |
Evaluation with RN50-C4 as the backbone,
python3 ./test_net.py \
--num-gpus 8 \
--eval-only \
--config-file ./sas_det/configs/ovd_lvis_R50_C4_ensemble_PLs.yaml \
MODEL.WEIGHTS ./pretrained_ckpt/sas_det/sas_det_lvis_r50.pth \
MODEL.CLIP.OFFLINE_RPN_CONFIG ./sas_det/configs/regionclip/LVISv1-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_lvis_866_lsj.pth \
MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_866_base_cls_emb.pth \
MODEL.CLIP.CONCEPT_POOL_EMB ./pretrained_ckpt/concept_emb/my_lvis_866_base_337_cls_emb.pth \
MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_1203_cls_emb.pth \
MODEL.CLIP.OFFLINE_RPN_LSJ_PRETRAINED True \
MODEL.ENSEMBLE.TEST_CATEGORY_INFO "./datasets/lvis_ovd_continue_cat_ids.json" \
MODEL.ENSEMBLE.ALPHA 0.33 MODEL.ENSEMBLE.BETA 0.67 \
OUTPUT_DIR output/eval
Evaluation with RN50x4-C4 as the backbone,
python3 ./test_net.py \
--num-gpus 8 \
--eval-only \
--config-file ./sas_det/configs/ovd_lvis_R50_C4_ensemble_PLs.yaml \
MODEL.WEIGHTS ./pretrained_ckpt/sas_det/sas_det_lvis_r50x4.pth \
MODEL.CLIP.OFFLINE_RPN_CONFIG ./sas_det/configs/regionclip/LVISv1-InstanceSegmentation/mask_rcnn_R_50_FPN_1x.yaml \
MODEL.CLIP.BB_RPN_WEIGHTS ./pretrained_ckpt/rpn/rpn_lvis_866_lsj.pth \
MODEL.CLIP.TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_866_base_cls_emb_rn50x4.pth \
MODEL.CLIP.CONCEPT_POOL_EMB ./pretrained_ckpt/concept_emb/my_lvis_866_base_337_cls_emb_rn50x4.pth \
MODEL.CLIP.OPENSET_TEST_TEXT_EMB_PATH ./pretrained_ckpt/concept_emb/lvis_1203_cls_emb_rn50x4.pth \
MODEL.CLIP.OFFLINE_RPN_LSJ_PRETRAINED True \
MODEL.CLIP.TEXT_EMB_DIM 640 \
MODEL.RESNETS.DEPTH 200 \
MODEL.ROI_BOX_HEAD.POOLER_RESOLUTION 18 \
MODEL.ROI_MASK_HEAD.POOLER_RESOLUTION 18 \
MODEL.ENSEMBLE.TEST_CATEGORY_INFO "./datasets/lvis_ovd_continue_cat_ids.json" \
MODEL.ENSEMBLE.ALPHA 0.33 MODEL.ENSEMBLE.BETA 0.67 \
OUTPUT_DIR output/eval
This repository was built on top of Detectron2, RegionCLIP, and VLDet. We thank the effort from our community.