Skip to content

Latest commit

 

History

History
184 lines (145 loc) · 6.86 KB

README.md

File metadata and controls

184 lines (145 loc) · 6.86 KB

In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation



result

This repo is the official implementation of the ECCV 2024 paper In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation

Conda installation command

conda env create -f environment.yml --prefix $YOURPREFIX

$YOUPREFIX is typically /home/$USER/anaconda3

Dependencies

This repo is built on CLIP, SCLIP, and MMSegmentation.

mim install mmcv==2.0.1 mmengine==0.8.4 mmsegmentation==1.1.1
pip install ftfy regex yapf==0.40.1

Dataset preparation

Please make it compatible with Pascal VOC 2012, Pascal Context, COCO stuff 164K, COCO object, ADEChallengeData2016, and Cityscapes following the MMSeg data preparation. The COCO-Object dataset can be converted from COCO-Stuff164k by executing the following command:

python datasets/cvt_coco_object.py PATH_TO_COCO_STUFF164K -o PATH_TO_COCO164K

Place them under $yourdatasetroot/ directory such that:

    $yourdatasetroot/
    ├── ADEChallengeData2016/
    │   ├── annotations/
    │   ├── images/
    │   ├── ...
    ├── VOC2012/
    │   ├── Annotations/
    │   ├── JPEGImages/
    │   ├── ...
    ├── coco_stuff164k/
    │   ├── annotations/
    │   ├── images/
    │   ├── ...
    ├── Cityscapes/
    │   ├── gtFine/
    │   ├── leftImg8bit/
    │   ├── ...
    ├── ...

1) Panoptic Cut for unsupervised object mask discovery

cd panoptic_cut
python predict.py \
    --logs panoptic_cut \
    --dataset {coco_object, coco_stuff, ade20k, voc21, voc20, context60, context59, cityscapes} \
    --datasetroot $yourdatasetroot

The checkpoints for the panoptic mask discovery is found below google drive:

mask prediction root after stage 1) benchmark id Google drive link
coco_stuff164k coco_object, coco_stuff164k link to download (84.5 MB)
VOC2012 context59, context60, voc20, voc21 link to download (66.7 MB)
ADEChallengeData2016 ade20k link to download (29.4 MB)
Cityscapes cityscapes link to download (23.1 MB)

Place them under lavg/panoptic_cut/pred/ directory such that:

    lavg/panoptic_cut/pred/panoptic_cut/
    ├── ADEChallengeData2016/
    │   ├── ADE_val_00000001.pth
    │   ├── ADE_val_00000002.pth
    │   ├── ...
    ├── VOC2012/
    │   ├── 2007_000033.pth
    │   ├── 2007_000042.pth
    │   ├── ...
    ├── coco_stuff164k/
    │   ├── 000000000139.pth
    │   ├── 000000000285.pth
    │   ├── ...
    ├── Cityscapes/
    │   ├── frankfurt_000000_000294_leftImg8bit.pth
    │   ├── ...

2) Visual grounding & Segmentation evaluation

Update $yourdatasetroot in configs/cfg_*.py

cd lavg
python eval.py --config ./configs/{cfg_context59/cfg_context60/cfg_voc20/cfg_voc21}.py --maskpred_root VOC2012/panoptic_cut
python eval.py --config ./configs/cfg_ade20k.py --maskpred_root ADEChallengeData2016/panoptic_cut
python eval.py --config ./configs/{cfg_coco_object/cfg_coco_stuff164k}.py --maskpred_root coco_stuff164k/panoptic_cut
python eval.py --config ./configs/cfg_city_scapes.py --maskpred_root Cityscapes/panoptic_cut

The run is a single-GPU compatible.

Quantitative performance (mIoU, %) on open-vocabulary semantic segmentation benchmarks

With background category Without background category
Method VOC21 Context60 COCO-obj VOC20 Context59 ADE COCO-stuff Cityscapes
LaVG 62.1 31.6 34.2 82.5 34.7 15.8 23.2 26.2

Related repos

Our project refers to and heavily borrows some the codes from the following repos:

Acknowledgements

This work was supported by Samsung Electronics (IO201208-07822-01), the NRF grant (NRF-2021R1A2C3012728 (45%), and the IITP grants (RS-2022-II220959: Few-Shot Learning of Causal Inference in Vision and Language for Decision Making (50%), RS-2019-II191906: AI Graduate School Program at POSTECH (5%)) funded by Ministry of Science and ICT, Korea. We also thank Sua Choi for her helpful discussion.

BibTex source

If you find our code or paper useful, please consider citing our paper:

@inproceedings{kang2024lazy,
  title={In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation},
  author={Kang, Dahyun and Cho, Minsu},
  booktitle={European Conference on Computer Vision},
  year={2024}
}