Pseudo-RIS

Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation
Seonghoon Yu, +Paul Hongsuck Seo, +Jeany Son (+ corresponding authors)
AI graduate school, GIST and Korea University
ECCV 2024

Abstract
We propose a new framework that automatically generates high-quality segmentation masks with their referring expressions as pseudo supervisions for referring image segmentation (RIS). These pseudo supervisions allow the training of any supervised RIS methods without the cost of manual labeling. To achieve this, we incorporate existing segmentation and image captioning foundation models, leveraging their broad generalization capabilities. However, the naive incorporation of these models may generate non-distinctive expressions that do not distinctively refer to the target masks. To address this challenge, we propose two-fold strategies that generate distinctive captions: 1) 'distinctive caption sampling', a new decoding method for the captioning model, to generate multiple expression candidates with detailed words focusing on the target. 2) 'distinctiveness-based text filtering' to further validate the candidates and filter out those with a low level of distinctiveness. These two strategies ensure that the generated text supervisions can distinguish the target from other objects, making them appropriate for the RIS annotations. Our method significantly outperforms both weakly and zero-shot SoTA methods on the RIS benchmark datasets. It also surpasses fully supervised methods in unseen domains, proving its capability to tackle the open-world challenge within RIS. Furthermore, integrating our method with human annotations yields further improvements, highlighting its potential in semi-supervised learning applications.

Installation

1. Environment

# create conda env
conda create -n pseudo_ris python=3.9

# activate the environment
conda activate pseudo_ris

# Install Pytorch
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch

# Install required package
pip install pydantic==1.10.11 --upgrade
conda install -c conda-forge spacy
python -m spacy download en_core_web_lg
conda install -c anaconda pandas
pip install opencv-python
pip install lmdb
pip install pyarrow==11.0.0
pip install colored
pip install pycocotools

pip install transformers==4.31

2. Third Party

# Install CoCa in a dev mode, where distinctive caption sampling is implemented.
cd third_party/open_clip
pip install -e .

# Install detectron2 for CutLER 
cd third_party/detectron2
pip install -e .

# Install CLIP
cd third_party/CLIP
pip install -e .

# Install SAM in a dev mode
cd segment-anything
pip install -e .

3. Download pre-trained weights

We use the pre-trained weights for (1) CoCa, (2) SAM, and (3) CutLER.

For CoCa

Note that, official CoCa repository offers pre-trained model on LAION-2B.

We fine-tune this on CC3M dataset.

We provide CoCa pre-trained weights on LAION-2B and CC3M in the this URL.

Put this in ./third_party/open_clip/src/logs/laion_cc3m/checkpoints/

For SAM

We use SAM ViT-H model.

# Download SAM ViT-H model.
cd segment-anything
mkdir checkpoints
cd checkpoints
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth

For CutLER

We use CutLER to reduce the excessive number of SAM masks and over-segmented SAM masks to prevent OOM issues, as demonstrated in our supplementary and implementation details.

cd third_party/CuTLER/cutler/
mkdir checkpoints
cd checkpoints
wget http://dl.fbaipublicfiles.com/cutler/checkpoints/cutler_cascade_final.pth

Dataset

We follow a dataset setup in ETRIS to get unlabeled images in the train set of refcoco+.

├── datasets
│   ├── images
│   │   └── train2014
│   │       ├── COCO_train2014_000000000009.jpg
│   │       └── ...
│   └── lmdb
│       └── refcoco+
│           ├── train.lmdb
│           └── ...

Generate pseudo RIS annotations

1. Generate pseudo masks

We produce pseudo-masks using SAM and CutLER, as demonstrated in our implementation details and supplementary material.

Pseudo masks are saved in './datasets/pseudo_masks/cutler_sam' directory.

python generate_masks/cutler_sam_masks.py

2. Generate distinctive referring expressions on each pseudo mask.

Pseudo referring texts are saved in './pseudo_supervision/cutler_sam/distinctive_captions_cc3m.csv'

python generate_pseudo_supervision/distinctive_caption_generation.py

Citation

@inproceedings{yu2024pseudoris,
    title={Pseudo-RIS: Distinctive Pseudo-supervision Generation for Referring Image Segmentation},
    author={Seonghoon Yu and Paul Hongsuck Seo and Jeany Son},
    booktitle={Proceedings of the European Conference on Computer Vision},
    year={2024}
}

Acknowledgements

We are thanks to open-source foundation models (CoCa, SAM, CLIP).

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
generate_masks		generate_masks
generate_pseudo_supervision		generate_pseudo_supervision
my_tools		my_tools
my_utils		my_utils
third_party		third_party
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pseudo-RIS

Installation

1. Environment

2. Third Party

3. Download pre-trained weights

For CoCa

For SAM

For CutLER

Dataset

Generate pseudo RIS annotations

1. Generate pseudo masks

2. Generate distinctive referring expressions on each pseudo mask.

Citation

Acknowledgements

About

Releases

Packages

Languages

Seonghoon-Yu/Pseudo-RIS

Folders and files

Latest commit

History

Repository files navigation

Pseudo-RIS

Installation

1. Environment

2. Third Party

3. Download pre-trained weights

For CoCa

For SAM

For CutLER

Dataset

Generate pseudo RIS annotations

1. Generate pseudo masks

2. Generate distinctive referring expressions on each pseudo mask.

Citation

Acknowledgements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages