Ziyang Luo, Nian Liu, Wangbo Zhao, Xuguang Yang, Dingwen Zhang, Deng-Ping Fan, Fahad Khan, Junwei Han
Approach: [arxiv Paper]
We introduce VSCode, a generalist model with novel 2D prompt learning, to jointly address four SOD tasks and three COD tasks. We utilize VST as the foundation model and introduce 2D prompts within the encoder-decoder architecture to learn domain and task-specific knowledge on two separate dimensions. A prompt discrimination loss helps disentangle peculiarities to benefit model optimization. VSCode outperforms state-of-the-art methods across six tasks on 26 datasets and exhibits zero-shot generalization to unseen tasks by combining 2D prompts, such as RGB-D COD.
Pytorch
For RGB SOD and RGB-D SOD, we employ the following datasets to train our model concurrently: the training set of DUTS for RGB SOD
, the training sets of NJUD, NLPR, and DUTLF-Depth for RGB-D SOD
.
For testing the RGB SOD task, we use DUTS, ECSSD, HKU-IS, PASCAL-S, DUT-O, and SOD, while STERE, NJUD, NLPR, DUTLF-Depth, SIP, and ReDWeb-S datasets are employed for testing the RGB-D SOD task. You can directly download these datasets by following [VST].
We employ the training set of VT5000 to train our model, and VT821, VT1000, and the testing of VT5000 are utilized for testing (from link). Please download the corresponding contour maps from [baidu,PIN:m9ht] for VT5000 and place them into the RGBT
folder.
For VSOD, we employ six widely used benchmark datasets: DAVIS, FBMS, ViSal, SegV2, DAVSOD-Easy, and DAVSOD-Normal (from link). Please download corresponding contour maps and optical flow from [baidu,PIN:jyzy] and [[baidu[(https://pan.baidu.com/s/1IUPH8jG-t2ZlK1Acw1W1oA),PIN:bxi7] for DAVIS and DAVSOD, and put it into Video
folder. For VSOD and VCOD tasks, we follow the common practice of utilizing Flownet2.0 as the optical flow extractor due to its consistently strong performance.
Regarding RGB COD, three extensive benchmark datasets are considered, including COD10K, CAMO, and NC4K. Please download the corresponding contour maps from [baidu,PIN:gkq2] and [baidu,PIN:zojp] for COD10K and CAMO, and put it into COD/rgb/
folder.
For VCOD, we utilize two widely accepted benchmark datasets: CAD and MoCA-Mask (from link). Please download the corresponding contour maps and optical flow from [baidu,PIN:tjah] for MoCA-Mask, and put it into COD/rgbv/
folder.
The total dataset folder should like this:
-- Data
| -- RGB
| | -- DUTS
| | -- ECSSD
...
| -- RGBD
| | -- NJUD
| | -- NLPR
...
| -- RGBT
| | -- VT821
| | -- | RGB
| | -- | GT
| | -- | T
| | -- VT5000
| | | -- Train
| | | -- | RGB
| | | -- | GT
| | | -- | T
| | | -- | Contour
| | | -- Test
...
| -- Video
| | -- Train
| | | -- DAVSOD
| | | | -- select_0043
| | | | -- | RGB
| | | | -- | GT
| | | | -- | Flow
| | | | -- | Contour
| | -- Test
| | | -- DAVIS16
| | | | -- blackswan
| | | | -- | Frame
| | | | -- | GT
| | | | -- | OF_FlowNet2
...
| -- COD
| | -- rgb
| | | -- Train
| | | | -- CAMO
| | | | -- | RGB
| | | | -- | GT
| | | | -- | Contour
| | | -- Test
| | | | -- CAMO
| | | | -- | RGB
| | | | -- | GT
...
| | -- rgbv
| | | -- Train
| | | | -- MoCA_Mask
| | | | | -- TrainDataset_per_sq
| | | | | | -- crab
| | | | | | -- | Imgs
| | | | | | -- | GT
| | | | | | -- | Flow
| | | | | | -- | Contour
| | | -- Test
| | | | -- MoCA_Mask
| | | | | | -- arctic_fox
| | | | | | -- | Imgs
| | | | | | -- | GT
| | | | | | -- | Flow
...
Run python train_test_eval.py --Training True --Testing True --Evaluation True
for training, testing, and evaluation which is similar to VST.
Please be aware that our evaluation tool may exhibit some differences from Zhao Zhang for VSOD, as certain ground truth maps may not be binarized.
Name | Backbone | Params | Weight |
---|---|---|---|
VSCode-T | Swin-T | 54.09 | [baidu,PIN:mmn1]/[Geogle Drive] |
VSCode-S | Swin-S | 74.72 | [baidu,PIN:8jig]/[Geogle Drive |
VSCode-B | Swin-B | 117.41 | [baidu,PIN:kidl]/[Geogle Drive |
We offer the prediction maps of VSCode-T [baidu,PIN:gsvf]/ [Geogle Drive] , VSCode-S [baidu,PIN:ohf5]/[Geogle Drive], and VSCode-B [baidu,PIN:uldc]/[Geogle Drive] at this time.
If you use VSCode in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.
@article{luo2023vscode,
title={VSCode: General Visual Salient and Camouflaged Object Detection with 2D Prompt Learning},
author={Luo, Ziyang and Liu, Nian and Zhao, Wangbo and Yang, Xuguang and Zhang, Dingwen and Fan, Deng-Ping and Khan, Fahad and Han, Junwei},
journal={arXiv preprint arXiv:2311.15011},
year={2023}
}