RAP-SAM: Towards Real-Time All-Purpose Segment Anything

Shilin Xu · Haobo Yuan · Qingyu Shi · Lu Qi · Jingbo Wang · Yibo Yang · Yining Li · Kai Chen · Yunhai Tong · Bernard Ghanem · Xiangtai Li · Ming-Hsuan Yang
PKU, NTU, UC-Merced, Shanghai AI, KAUST, Google Research

Introduction

We present real-time all-purpose segmentation to segment and recognize objects for image, video, and interactive inputs. In addition to benchmarking, we also propose a simple yet effective baseline, named RAP-SAM, which achieves the best accuracy and speed trade-off among three different tasks.

Method

Our RAP-SAM is a simple encoder and decoder architecture. It contains a backbone, a lightweight neck, and a shared multitask decoder. Following SAM, we also adopt the prompt encoder to encode visual prompts into a query. We adopt the same decoder for both visual prompts and initial object queries to share more computation and parameters. To better balance the results for in-teractive segmentation and image/video segmentation, we design a prompt adapter and an object adapter in the end of the decoder.

Requirements

The detection framework is built upon MMDet3.0.

Install the packages:

pip install mmengine==0.8.4
pip install mmdet==3.3.0

Generate classifier using the following command or download from CocoPanopticOVDataset_YouTubeVISDataset_2019.pth and CocoPanopticOVDataset.pth .

PYTHONPATH='.' python tools/gen_cls.py configs/rap_sam/rap_sam_convl_12e_adaptor.py

Data Preparation

The main experiments are conducted on COCO and YouTube-VIS-2019 datasets. Please prepare datasets and organize them like the following:

├── data
    ├── coco
        ├── annotations
            ├── instances_val2017.json
        ├── train2017
        ├── val2017
    ├── youtube_vis_2019
        ├── annotations
            ├── youtube_vis_2019_train.json
            ├── youtube_vis_2019_valid.json
        ├── train    
        ├── valid

Run Demo

python demo/demo.py demo/demo.jpg configs/rap_sam/eval_rap_sam_coco.py --weights rapsam_r50_12e.pth

Inference

We provide the checkpoint here. You can download them and then run the command below for inference.

rapsam_r50_12e.pth

Test on COCO Panoptic

./tools/dist_test.sh configs/rap_sam/eval_rap_sam_coco.py $CKPT $NUM_GPUS

Test on Video Instance Segmentation

./tools/dist_test.sh configs/rap_sam/eval_rap_sam_yt19.py $CKPT $NUM_GPUS

Test on Interactive Segmentation (COCO-SAM)

./tools/dist_test.sh configs/rap_sam/eval_rap_sam_prompt.py $CKPT $NUM_GPUS

Training

The code will be release soon!!! Please stay tuned.

Visualization

Interactive Segmentation

VIS Segmentation

COCO Panoptic Segmentation

Citation

@article{xu2024rapsam,
    title={RAP-SAM: Towards Real-Time All-Purpose Segment Anything},
    author={Shilin Xu and Haobo Yuan and Qingyu Shi and Lu Qi and Jingbo Wang and Yibo Yang and Yining Li and Kai Chen and Yunhai Tong and Bernard Ghanem and Xiangtai Li and Ming-Hsuan Yang},
    journal={arXiv preprint},
    year={2024}
}

License

MIT license

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
configs		configs
demo		demo
ext		ext
seg		seg
tools		tools
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

Introduction

Method

Requirements

Data Preparation

Run Demo

Inference

Test on COCO Panoptic

Test on Video Instance Segmentation

Test on Interactive Segmentation (COCO-SAM)

Training

Visualization

Interactive Segmentation

VIS Segmentation

COCO Panoptic Segmentation

Citation

License

About

Releases

Packages

Contributors 3

Languages

License

xushilin1/RAP-SAM

Folders and files

Latest commit

History

Repository files navigation

RAP-SAM: Towards Real-Time All-Purpose Segment Anything

Introduction

Method

Requirements

Data Preparation

Run Demo

Inference

Test on COCO Panoptic

Test on Video Instance Segmentation

Test on Interactive Segmentation (COCO-SAM)

Training

Visualization

Interactive Segmentation

VIS Segmentation

COCO Panoptic Segmentation

Citation

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages