Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

Zifu Wan¹, Yuhao Wang², Silong Yong¹, Pingping Zhang², Simon Stepputtis¹, Katia Sycara¹, Yaqi Xie¹

¹ Robotics Institute, Carnegie Mellon University, USA
² School of Future Technology, Dalian University of Technology, China

👀Introduction

This repository contains the code for our paper Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation. [Paper]

Sigma, as a lightweight and efficient method, reaches a balance between accuracy and speed. (Results below are calculated on MFNet dataset)

💡Environment

We test our codebase with PyTorch 1.13.1 + CUDA 11.7 as well as PyTorch 2.2.1 + CUDA 12.1. Please install corresponding PyTorch and CUDA versions according to your computational resources. We showcase the environment creating process with PyTorch 1.13.1 as follows.

Create environment.

conda create -n sigma python=3.9
conda activate sigma

Install all dependencies. Install pytorch, cuda and cudnn, then install other dependencies via:

pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117

pip install -r requirements.txt

Install Mamba

cd models/encoders/selective_scan && pip install . && cd ../../..

⏳Setup

Datasets

We use four datasets, including both RGB-Thermal and RGB-Depth datasets:
Please refer to the original dataset websites for more details. You can directly download the processed RGB-Depth datasets from DFormer, though you may need to make small modifications to the txt files.
We also provide the processed datasets (including RGB-Thermal and RGB-Depth) we use here: Google Drive Link.

If you are using your own datasets, please orgnize the dataset folder in the following structure:

<datasets>
|-- <DatasetName1>
    |-- <RGBFolder>
        |-- <name1>.<ImageFormat>
        |-- <name2>.<ImageFormat>
        ...
    |-- <ModalXFolder>
        |-- <name1>.<ModalXFormat>
        |-- <name2>.<ModalXFormat>
        ...
    |-- <LabelFolder>
        |-- <name1>.<LabelFormat>
        |-- <name2>.<LabelFormat>
        ...
    |-- train.txt
    |-- test.txt
|-- <DatasetName2>
|-- ...

train.txt/test.txt contains the names of items in training/testing set, e.g.:

<name1>
<name2>
...

📦Usage

Training

Please download the pretrained VMamba weights:
- VMamba_Tiny.
- VMamba_Small.
- VMamba_Base.
Please put them under pretrained/vmamba/.
Config setting.

Edit config file in the configs folder.
Change C.backbone to sigma_tiny / sigma_small / sigma_base to use the three versions of Sigma.

Run multi-GPU distributed training:

NCCL_P2P_DISABLE=1 CUDA_VISIBLE_DEVICES="0,1,2,3" python -m torch.distributed.launch --nproc_per_node=4  --master_port 29502 train.py -p 29502 -d 0,1,2,3 -n "dataset_name"

Here, dataset_name=mfnet/pst/nyu/sun, referring to the four datasets.

You can also use single-GPU training:

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" torchrun -m --nproc_per_node=1 train.py -p 29501 -d 0 -n "dataset_name"

Results will be saved in log_final folder.

Evaluation

Run the evaluation by:
```
CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python eval.py -d="0" -n "dataset_name" -e="epoch_number" -p="visualize_savedir"
```
Here, dataset_name=mfnet/pst/nyu/sun, referring to the four datasets.
epoch_number refers to a number standing for the epoch number you want to evaluate with. You can also use a .pth checkpoint path directly for epoch_number to test for a specific weight.

If you want to use multi GPUs please specify multiple Device IDs:

CUDA_VISIBLE_DEVICES="0,1,2,3,4,5,6,7" python eval.py -d="0,1,2,3,4,5,6,7" -n "dataset_name" -e="epoch_number" -p="visualize_savedir"

Results will be saved in log_final folder.

📈Results

We provide our trained weights on the four datasets:

MFNet (9 categories)

Architecture	Backbone	mIOU	Weight
Sigma	VMamba-T	60.2%	Sigma-T-MFNet
Sigma	VMamba-S	61.1%	Sigma-S-MFNet
Sigma	VMamba-B	61.3%	Sigma-B-MFNet

PST900 (5 categories)

Architecture	Backbone	mIOU	Weight
Sigma	VMamba-T	88.6%	Sigma-T-PST
Sigma	VMamba-S	87.8%	Sigma-S-PST

NYU Depth V2 (40 categories)

Architecture	Backbone	mIOU	Weight
Sigma	VMamba-T	53.9%	Sigma-T-NYU
Sigma	VMamba-S	57.0%	Sigma-S-NYU

SUN RGB-D (37 categories)

Architecture	Backbone	mIOU	Weight
Sigma	VMamba-T	50.0%	Sigma-T-SUN
Sigma	VMamba-S	52.4%	Sigma-S-SUN

🙏Acknowledgements

Our dataloader codes are based on CMX. Our Mamba codes are adapted from Mamba and VMamba. We thank the authors for releasing their code! We also appreciate DFormer for providing their processed RGB-Depth datasets.

📧Contact

If you have any questions, please contact at zifuw@andrew.cmu.edu.

📌 BibTeX & Citation

If you find this code useful, please consider citing our work:

@article{wan2024sigma,
  title={Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation},
  author={Wan, Zifu and Wang, Yuhao and Yong, Silong and Zhang, Pingping and Stepputtis, Simon and Sycara, Katia and Xie, Yaqi},
  journal={arXiv preprint arXiv:2404.04256},
  year={2024}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

👀Introduction

💡Environment

⏳Setup

Datasets

📦Usage

Training

Evaluation

📈Results

MFNet (9 categories)

PST900 (5 categories)

NYU Depth V2 (40 categories)

SUN RGB-D (37 categories)

🙏Acknowledgements

📧Contact

📌 BibTeX & Citation

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
configs		configs
dataloader		dataloader
engine		engine
figs		figs
models		models
pretrained/vmamba		pretrained/vmamba
utils		utils
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
eval.py		eval.py
requirements.txt		requirements.txt
train.py		train.py

License

zifuwan/Sigma

Folders and files

Latest commit

History

Repository files navigation

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

👀Introduction

💡Environment

⏳Setup

Datasets

📦Usage

Training

Evaluation

📈Results

MFNet (9 categories)

PST900 (5 categories)

NYU Depth V2 (40 categories)

SUN RGB-D (37 categories)

🙏Acknowledgements

📧Contact

📌 BibTeX & Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages