This repogitory store the code for implementing the Global-Local CLIP algorithm for zero-shot referring image segmentation.
The performances of Global-Local CLIP using SAM as a mask generator are reported in this paper "Pseudo-RIS".
Zero-shot Referring Image Segmentation with Global-Local Context Features
Seonghoon Yu, Paul Hongsuck Seo, Jeany Son
AI graduate school, GIST and Google Research
CVPR 2023
paper | arxiv | video | poster | tutorial | bibtex
# cteate conda env
conda create -n zsref python=3.8
# activate the environment
conda activate zsref
# Install Pytorch 1.10 version with GPU
conda install pytorch==1.10.0 torchvision==0.11.0 torchaudio==0.10.0 cudatoolkit=11.3 -c pytorch -c conda-forge
# Install spacy for language processing
conda install -c conda-forge spacy
pip install pydantic==1.10.11 --upgrade
python -m spacy download en_core_web_lg
# Install required package
pip install opencv-python
pip install scikit-image
pip install h5py
conda install -c conda-forge einops
pip install markupsafe==2.0.1
# Install modified CLIP in a dev mode
cd third_parth
cd modified_CLIP
pip install -e .
# Install detectron2 for FreeSOLO
cd ..
cd old_detectron2
pip install -e .
pip install pillow==9.5.0
we use FreeSOLO which is an unsupervised instance segmentation model as the mask generator
mkdir checkpoints
cd checkpoints
wget https://cloudstor.aarnet.edu.au/plus/s/V8C0onE5H63x3RD/download
mv download FreeSOLO_R101_30k_pl.pth
we follow dataset setup in LAVT
In "./refer/data/images/mscoco/images" path
wget http://images.cocodataset.org/zips/train2014.zip
unzip train2014
In "./refer/data" path
# RefCOCO
wget https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco.zip
unzip refcoco.zip
# RefCOCO+
wget https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcoco+.zip
unzip refcoco+.zip
# RefCOCOg
wget https://bvisionweb1.cs.unc.edu/licheng/referit/data/refcocog.zip
unzip refcocog.zip
To evaluate a model's performance on RefCOCO variants, use
python Our_method_with_free_solo.py --dataset refcoco --split val
For options,
--dataset: refcoco, refcoco+, refcocog
--split: val, testA, testB for refcoco and val, test for refcocog
Please consider citing our paper in your publications, if our findings help your research.
@InProceedings{Yu_2023_CVPR,
author = {Yu, Seonghoon and Seo, Paul Hongsuck and Son, Jeany},
title = {Zero-Shot Referring Image Segmentation With Global-Local Context Features},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2023},
pages = {19456-19465}
}
Code is built upon several public repositories.
Thanks.