Skip to content

Latest commit

 

History

History
111 lines (87 loc) · 5.24 KB

README.md

File metadata and controls

111 lines (87 loc) · 5.24 KB

MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation

Minhyun Lee1,2, Seungho Lee1, Song Park2, Dongyoon Han2, Byeongho Heo2, Hyunjung Shim3

1Yonsei University, 2 NAVER AI LAB, 3KAIST

CC BY-NC 4.0 Paper

PWC PWC

Official PyTorch implementation of "MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation" | arxiv.

Abstract

Referring Image Segmentation (RIS) is a vision-language task that identifies and segments objects in images based on free-form text descriptions. This study investigates effective data augmentation strategies and proposes a novel framework called Masked Referring Image Segmentation (MaskRIS). MaskRIS employs image and text masking to improve model robustness against occlusions and incomplete information. Experimental results show that MaskRIS integrates with existing models and achieves state-of-the-art performance on RefCOCO, RefCOCO+, and RefCOCOg datasets in both fully supervised and weakly supervised settings.

Updates

  • Nov 28, 2024: Arxiv paper is released.

Requirements

This code is tested with:

  • Python 3.8
  • PyTorch 1.11.0

Other dependencies are listed in requirements.txt.

Datasets

1. Text Annotations

  • RefCOCO Series Annotations
    • Download locations:
      • RefCOCO, RefCOCO+, RefCOCOg: Follow instructions in .refer/README.md
      • Combined annotations (refcocom): Google Drive Link

2. Image Data

  • COCO Dataset
    • Source: COCO Official Website
    • Required file: train_2014.zip (83K images, 13GB)
    • Instructions:
      1. Download from the first link: "2014 Train images [83K/13GB]"
      2. Extract the downloaded train_2014.zip file

3. Data structure

  • Data paths should be as follows:
    DATA_PATH
        ├── refcocom
        ├── train2014
        └── refer
            ├── refcoco
            ├── refcoco+
            └── refcocog
    

Pretrained Models

Usage

By default, we use fp16 training for efficiency. To train a model on refcoco with 2 GPUs, modify DATA_PATH, REFER_PATH, SWIN_PATH, and OUTPUT_PATH in scripts/script.sh then run:

bash scripts/script.sh

You can change DATASET to refcoco+/refcocog/refcocom for training on different datasets. Note that for RefCOCOg, there are two splits (umd and google). You should add --splitBy umd or --splitBy google to specify the split.

Citation

@article{lee2024maskris,
  title={MaskRIS: Semantic Distortion-aware Data Augmentation for Referring Image Segmentation},
  author={Lee, Minhyun and Lee, Seungho and Park, Song and Han, Dongyoon and Heo, Byeongho and Shim, Hyunjung},
  journal={arXiv preprint arXiv:2411.19067},
  year={2024}
}

References

This repo is mainly built based on CARIS and mmdetection. Thanks for their great work!

License

MaskRIS
Copyright (c) 2024-present NAVER Cloud Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.