MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

Minhyun Lee^1,2, Seungho Lee¹, Song Park², Dongyoon Han², Byeongho Heo², Hyunjung Shim³

¹Yonsei University, ² NAVER AI LAB, ³KAIST

Official PyTorch implementation of "MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation" | arxiv.

Abstract

Referring Image Segmentation (RIS) is a vision-language task that identifies and segments objects in images based on free-form text descriptions. This study investigates effective data augmentation strategies and proposes a novel framework called Masked Referring Image Segmentation (MaskRIS). MaskRIS employs image and text masking to improve model robustness against occlusions and incomplete information. Experimental results show that MaskRIS integrates with existing models and achieves state-of-the-art performance on RefCOCO, RefCOCO+, and RefCOCOg datasets in both fully supervised and weakly supervised settings.

Updates

Nov 28, 2024: Arxiv paper is released.

Requirements

This code is tested with:

Python 3.8
PyTorch 1.11.0

Other dependencies are listed in requirements.txt.

Datasets

1. Text Annotations

RefCOCO Series Annotations
- Download locations:
  - RefCOCO, RefCOCO+, RefCOCOg: Follow instructions in .refer/README.md
  - Combined annotations (refcocom): Google Drive Link

2. Image Data

COCO Dataset
- Source: COCO Official Website
- Required file: train_2014.zip (83K images, 13GB)
- Instructions:
  1. Download from the first link: "2014 Train images [83K/13GB]"
  2. Extract the downloaded train_2014.zip file

3. Data structure

Data paths should be as follows:

DATA_PATH
    ├── refcocom
    ├── train2014
    └── refer
        ├── refcoco
        ├── refcoco+
        └── refcocog

Pretrained Models

Image Encoder: Swin Transformer-Base
Text Encoder: BERT-Base

Usage

By default, we use fp16 training for efficiency. To train a model on refcoco with 2 GPUs, modify DATA_PATH, REFER_PATH, SWIN_PATH, and OUTPUT_PATH in scripts/script.sh then run:

bash scripts/script.sh

You can change DATASET to refcoco+/refcocog/refcocom for training on different datasets. Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd or --splitBy google to specify the split.

Citation

@article{lee2024maskris,
  title={MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation},
  author={Lee, Minhyun and Lee, Seungho and Park, Song and Han, Dongyoon and Heo, Byeongho and Shim, Hyunjung},
  journal={arXiv preprint arXiv:2411.19067},
  year={2024}
}

References

This repo is mainly built based on CARIS and mmdetection. Thanks for their great work!

License

MaskRIS
Copyright (c) 2024-present NAVER Cloud Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
bert		bert
data		data
model		model
refer		refer
scripts		scripts
util		util
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
config.py		config.py
criterion.py		criterion.py
eval.py		eval.py
main.py		main.py
requirements.txt		requirements.txt
training.py		training.py
transforms.py		transforms.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

Abstract

Updates

Requirements

Datasets

1. Text Annotations

2. Image Data

3. Data structure

Pretrained Models

Usage

Citation

References

License

About

Releases

Packages

Contributors 3

Languages

License

naver-ai/maskris

Folders and files

Latest commit

History

Repository files navigation

MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

Abstract

Updates

Requirements

Datasets

1. Text Annotations

2. Image Data

3. Data structure

Pretrained Models

Usage

Citation

References

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages