Code for paper: Salient Object Detection in RGB-D Videos [IEEEXplore][Arxiv]
- RDVS dataset and DCTNet+ model
Figure 1: Due to the limitation of using a single RGB/color modality (image) for SOD (termed RGB SOD), researchers have integrated scene depth information into the SOD task, often referred to as RGB-D SOD. Meanwhile, extending still images to the temporal case yields the video SOD (VSOD) task. We target at the RGB-D VSOD task, which can be deemed as extension from the prevalent RGB-D SOD and VSOD tasks.
To delve into such a potential task, and as one of the earliest works towards RGB-D VSOD, we contributes on two distinct aspects: 1) the dataset, and 2) the model.
We propose a new RGB-D Video Salient Object Dataset incorporating realistic depth information, and the dataset is named RDVS for short. RDVS contains 57 sequences, totaling 4,087 frames, and its annotation process is guided rigorously by gaze data captured from a professional eye-tracker. The collected video clips encompass various challenging scenarios, e.g., complex backgrounds, low contrast, occlusion, and heterogeneous objects. We also provide training and testing splits. Download the RDVS from "RDVS Dataset".
Figure 2 shows Statistics of the proposed RDVS dataset. (a) Attribute-based analyses of RDVS with comparison to DAVIS. (b) The pairwise dependencies across different attributes. (c) Scene/object categories of RDVS. (d) Center bias of RDVS and existing VSOD datasets.
Figure 3: Illustrative frames (with depth in the bottom-right) from RDVS with fixations (red dots, the top row) and the corresponding continuous saliency maps (overlaying on the RGB frames, the bottom row).
Click the above figure to watch saliency shift of all sequences in RDVS dataset (YouTube Link)
Figure 4. Overview of DCTNet+. (a) shows the big picture. (b) and (c) show the details of MAM and RFM, respectively. In the attention operations on the right-hand side in (c), since the coordinate attention and spatial attention processes are similar, the operations of spatial attention are represented in parentheses and are not repeated.
-
Requirements
- Python 3.9
- PyTorch 1.12.1
- Torchvision 0.13.1
- Cuda 11.6
-
Training
-
Download the pretrained ResNet34 backbone: Baidu Pan | Google Drive to './model/resnet/pre_train/'.
-
Download the train dataset (containing DAVIS16, DAVSOD, FBMS and DUTS-TR) from "Training set and test set" and save it at './dataset/train/*'.
-
Download the pretrained RGB, depth and flow stream models from Baidu Pan | Google Drive to './checkpoints/'
- Noting: the pre_trained RGB should be saved at './checkpoints/spatial', pre_trained depth shoule be saved at './checkpoints/depth' and flow shoule be saved at './checkpoints/flow'.
-
The training of entire DCTNet+ utilized one NVIDIA RTX 3090 GPU to accelerate.
- run
python train.py
in terminal
- run
-
(PS: For pretraining different streams)
- The pretraining code of different streams can be derived from
train.py
. We providepretrain_depth.py
and it can also be modified for pretraining the other two streams.
- The pretraining code of different streams can be derived from
-
-
Testing
- Download the test data (containing DAVIS16, DAVSOD, FBMS, SegTrack-V2, VOS) from "Training set and test set" and save it at './dataset/test/*'
- Download the trained model from "DCTNet+ model"(original model ckpt) and modify the
model_path
to its saving path in thetest.py
. - Run
python test.py
in the terminal.
- Full dataset with realistic depth (4.84G, 57 sequences): Baidu Pan | Google Drive (Update link:2023-10-23)
- Full dataset with synthetic depth (4.46G, 57 sequences): Baidu Pan (Update link:2023-10-23)
- Training Set containing realistic and synthetic depth (2.56G, 32 sequences): Baidu Pan | Google Drive (Update link:2023-10-23)
- Test Set containing realistic and synthetic depth (2.30G, 25 sequences): Baidu Pan | Google Drive (Update link:2023-10-23)
- Noting: realistic depth is in "/Depth" and synthetic depth is in "/SyntheticDepth"
- Original model ckpt: Baidu Pan | Google Drive (Update link:2023-10-23)
- Finetune on the test set of RDVS with realistic depth: Baidu Pan | Google Drive (Update link:2023-10-23)
- Finetune on the test set of RDVS with synthetic depth: Baidu Pan | Google Drive (Update link:2023-10-23)
- Noting: The pseudo RGB-D video datasets used for our model training and testing.
- Training set: Baidu Pan (Update link:2023-10-23)
- Test set: Baidu Pan (Update link:2023-10-23)
-
Noting: including RGB-D models, VSOD models, DCTNet and our DCTNet+(last line). (Update link:2023-10-23)
Year Publisher Paper Model DownloadLink1 DownloadLink2 RGB-D SOD Models 2020 ECCV BBSNet Code Baidu Google 2020 CVPR JLDCF Code Baidu Google 2020 CVPR S2MA Code Baidu Google 2020 ECCV HDFNet Code Baidu Google 2020 TIP DPANet Code Baidu Google 2021 ICCV SPNet Code Baidu Google 2021 TIP CDNet Code Baidu Google 2021 CVPR DCF Code Baidu Google 2021 ACMMM TriTransNet Code Baidu Google 2021 ICME BTSNet Code Baidu Google 2022 TNNLS RD3D Code Baidu Google 2022 TIP CIRNet Code Baidu Google 2023 ACMMM PICRNet Code Baidu Google 2023 TCSVT HRTransNet Code Baidu Google VSOD Models 2018 ECCV PDB Code Baidu Google 2019 ICCV MGAN Code Baidu Google 2019 CVPR SSAV Code Baidu Google 2020 AAAI PCSA Code Baidu Google 2021 ICCV FSNet Code Baidu Google 2021 ICCV DCFNet Code Baidu Google RGB-D VSOD Models 2022 ICIP DCTNet Code Baidu Google -- -- DCTNet+ -- Baidu Google
-
Noting: including DAVIS, DAVSOD-easy, FBMS, SegTrack-V2 and VOS. (Update link:2023-10-23) Other results before 2019 can be redirected to DAVSOD.
Year Publisher Paper Model DownloadLink1 DownloadLink2 RGB-D SOD Models 2019 ICCV MGAN Code Baidu Google 2020 AAAI PCSA Code Baidu Google 2020 ECCV TENet Code Baidu Google 2021 ICCV FSNet Code Baidu Google 2021 ICCV DCFNet Code Baidu Google 2022 NIPS UGPL Code Baidu Google 2022 SPL MGTNet Code Baidu Google 2023 TNNLS CoSTFormer -- Baidu Google RGB-D VSOD Models 2022 ICIP DCTNet Code Baidu Google -- -- DCTNet+ -- Baidu Google
Figure 5. Qualitative comparison of our DCTNet+ model and SOTA methods on conventional VSOD benchmarks.
Figure 6. Qualitative comparison on the proposed RDVS dataset.
Figure 7. Comparison of realistic depth and synthetic depth on the proposed RDVS dataset.
Please cite our paper if you find the work useful:
@inproceedings{lu2022depth,
title={Depth-cooperated trimodal network for video salient object detection},
author={Lu, Yukang and Min, Dingyao and Fu, Keren and Zhao, Qijun},
booktitle={2022 IEEE International Conference on Image Processing (ICIP)},
pages={116--120},
year={2022},
organization={IEEE}
}
@article{mou2024salient,
title={Salient Object Detection in RGB-D Videos},
author={Ao Mou and Yukang Lu and Jiahao He and Dingyao Min and Keren Fu and Qijun Zhao},
year={2024},
journal={IEEE Transactions on Image Processing},
volume={33},
pages={6660--6675},
year={2024}
}
We sincerely thank MPI Sintel, WSVD, Stereo Ego-Motion, SBM-RGBD and TUM-RGBD for their outstanding contributions on datasets!
@inproceedings{butler2012naturalistic,
title={A naturalistic open source movie for optical flow evaluation},
author={Butler, Daniel J and Wulff, Jonas and Stanley, Garrett B and Black, Michael J},
booktitle={Computer Vision--ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part VI 12},
pages={611--625},
year={2012},
organization={Springer}
}
@inproceedings{wang2019web,
title={Web stereo video supervision for depth prediction from dynamic scenes},
author={Wang, Chaoyang and Lucey, Simon and Perazzi, Federico and Wang, Oliver},
booktitle={2019 International Conference on 3D Vision (3DV)},
pages={348--357},
year={2019},
organization={IEEE}
}
@inproceedings{stereego,
title={https://lmb.informatik.uni-freiburg.
de/resources/datasets/StereoEgomotion/},
author={{Stereo Ego-Motion dataset}},
}
@inproceedings{camplani2017benchmarking,
title={A benchmarking framework for background subtraction in RGBD videos},
author={Camplani, Massimo and Maddalena, Lucia and Moy{\'a} Alcover, Gabriel and Petrosino, Alfredo and Salgado, Luis},
booktitle={New Trends in Image Analysis and Processing--ICIAP 2017: ICIAP International Workshops, WBICV, SSPandBE, 3AS, RGBD, NIVAR, IWBAAS, and MADiMa 2017, Catania, Italy, September 11-15, 2017, Revised Selected Papers 19},
pages={219--229},
year={2017},
organization={Springer}
}
@inproceedings{sturm2012benchmark,
title={A benchmark for the evaluation of RGB-D SLAM systems},
author={Sturm, J{\"u}rgen and Engelhard, Nikolas and Endres, Felix and Burgard, Wolfram and Cremers, Daniel},
booktitle={2012 IEEE/RSJ international conference on intelligent robots and systems},
pages={573--580},
year={2012},
organization={IEEE}
}