UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection [TMM 2024]
Created by Ruohao Guo, Xianghua Ying*, Yanyu Qi, Liao Qu
This repository contains PyTorch implementation for paper "UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection".
In this paper, we develop a Unified TRansformer-based framework, namely UniTR, aiming at tackling the above tasks individually with a unified architecture. Specifically, a transformer module (CoFormer) is introduced to learn the consistency of relevant objects or complementarity from different modalities. To generate high-quality segmentation maps, we adopt a dualstream decoding paradigm that allows the extracted consistent or complementary information to better guide mask prediction. Moreover, a feature fusion module (ZoomFormer) is designed to enhance backbone features and capture multi-granularity and multi-semantic information. Extensive experiments show that our UniTR performs well on 17 benchmarks, and surpasses existing SOTA approaches.
conda create -n unitr python=3.8 -y
conda activate unitr
pip install torch==1.11.0 torchvision==0.12.0
pip install timm opencv-python einops
pip install tensorboardX pycocotools imageio scipy moviepy thop
- co-segmentation and co-saliency object detection (training data: COCO2017):
cd ./co_object_saliency_detection
python main.py
- video salient object detection (training data: DAVIS and FBMS):
cd ./co_object_saliency_detection
python finetune.py
- co-segmentation (checkpoint: unitr_cos_swin.pth, unitr_cos_vgg.pth):
cd ./co_object_saliency_detection
python generate_maps_cos.py
- co-saliency object detection (checkpoint: unitr_cosod_swin.pth, unitr_cosod_vgg.pth):
cd ./co_object_saliency_detection
python generate_maps_cosod.py
- video salient object detection (checkpoint: unitr_vsod_swin.pth):
cd ./co_object_saliency_detection
python generate_maps_vsod.py
- co-segmentation (results: unitr_cos_swin):
cd ./co_object_saliency_detection
python generate_maps_cos.py
- co-saliency object detection (results: unitr_cosod_swin, unitr_cosod_vgg):
cd ./co_object_saliency_detection/eval
sh eval_cosod.sh
- video salient object detection (results: unitr_vsod_swin):
cd ./co_object_saliency_detection/eval
sh eval_vsod.sh
- RGB-T salient object detection (training data: VT5000):
cd ./multi_modal_saliency_detection/train
python train_rgbt.py
- RGB-D salient object detection (training data: NLPR_NJUD):
cd ./multi_modal_saliency_detection/train
python train_rgbd.py
- RGB-T salient object detection (checkpoint: unitr_rgbt_swin.pth, unitr_rgbt_vgg.pth):
cd ./multi_modal_saliency_detection/test
python generate_maps_rgbt.py
- RGB-D salient object detection (checkpoint: unitr_rgbd_swin.pth, unitr_rgbd_res.pth):
cd ./multi_modal_saliency_detection/test
python generate_maps_rgbd.py
- RGB-T salient object detection (results: unitr_rgbt_swin, unitr_rgbt_vgg:
cd ./multi_modal_saliency_detection/eval
python eval_rgbt.py
- RGB-D salient object detection (results: unitr_rgbd_swin, unitr_rgbd_res):
cd ./multi_modal_saliency_detection/eval
python eval_rgbd.py
If you want to improve the usability or any piece of advice, please feel free to contant directly (ruohguo@foxmail.com).
Thanks SSNM, Swin, UFO, and SwinNet contribution to the community!
Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.
@ARTICLE{10444934,
author={Guo, Ruohao and Ying, Xianghua and Qi, Yanyu and Qu, Liao},
journal={IEEE Transactions on Multimedia},
title={UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection},
year={2024},
volume={26},
pages={7622-7635},
doi={10.1109/TMM.2024.3369922}}