Minhyun Lee1,2, Seungho Lee1, Song Park2, Dongyoon Han2, Byeongho Heo2, Hyunjung Shim3
1Yonsei University, 2 NAVER AI LAB, 3KAIST
Official PyTorch implementation of "MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation" | arxiv.
Referring Image Segmentation (RIS) is a vision-language task that identifies and segments objects in images based on free-form text descriptions. This study investigates effective data augmentation strategies and proposes a novel framework called Masked Referring Image Segmentation (MaskRIS). MaskRIS employs image and text masking to improve model robustness against occlusions and incomplete information. Experimental results show that MaskRIS integrates with existing models and achieves state-of-the-art performance on RefCOCO, RefCOCO+, and RefCOCOg datasets in both fully supervised and weakly supervised settings.
- Nov 28, 2024: Arxiv paper is released.
This code is tested with:
- Python 3.8
- PyTorch 1.11.0
Other dependencies are listed in requirements.txt
.
- RefCOCO Series Annotations
- Download locations:
- RefCOCO, RefCOCO+, RefCOCOg: Follow instructions in
.refer/README.md
- Combined annotations (refcocom): Google Drive Link
- RefCOCO, RefCOCO+, RefCOCOg: Follow instructions in
- Download locations:
- COCO Dataset
- Source: COCO Official Website
- Required file:
train_2014.zip
(83K images, 13GB) - Instructions:
- Download from the first link: "2014 Train images [83K/13GB]"
- Extract the downloaded
train_2014.zip
file
- Data paths should be as follows:
DATA_PATH ├── refcocom ├── train2014 └── refer ├── refcoco ├── refcoco+ └── refcocog
- Image Encoder: Swin Transformer-Base
- Text Encoder: BERT-Base
By default, we use fp16 training for efficiency. To train a model on refcoco with 2 GPUs,
modify DATA_PATH
, REFER_PATH
, SWIN_PATH
,
and OUTPUT_PATH
in scripts/script.sh
then run:
bash scripts/script.sh
You can change DATASET
to refcoco+
/refcocog
/refcocom
for training on different datasets.
Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd
or --splitBy google
to specify the split.
@article{lee2024maskris,
title={MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation},
author={Lee, Minhyun and Lee, Seungho and Park, Song and Han, Dongyoon and Heo, Byeongho and Shim, Hyunjung},
journal={arXiv preprint arXiv:2411.19067},
year={2024}
}
This repo is mainly built based on CARIS and mmdetection. Thanks for their great work!
MaskRIS
Copyright (c) 2024-present NAVER Cloud Corp.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.