🔥🔥🔥Update 2023.02.19🔥🔥🔥
This is the code for CVPR2022 paper "Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation"
-
Download A2D-Sentences and JHMDB-Sentences. Then, please convert the raw data into image frames.
-
Please use RAFT to generate the opticla flow map (visualize in RGB format) from frame t to frame t+1. Since there are only a few frames annotated in A2D and JHMDB, we only need to generate optical flow maps for these frames.
-
Put them as follows:
your dataset dir/
└── A2D/
├── allframes/
├── allframes_flow/
├── Annotations_visualize
├── a2d_txt
└──train.txt
└──test.txt
└── J-HMDB/
├── allframes/
├── allframes_flow/
├── Annotations_visualize
├── jhmdb_txt
└──train.txt
└──test.txt
"Annotations_visualize" contains the GT masks for each target object. We have upload them to BaiduPan(lo50) for convenience.
Please consider to cite our work in your publications if you are interest in our research:
@inproceedings{zhao2022modeling,
title={Modeling Motion with Multi-Modal Features for Text-Based Video Segmentation},
author={Zhao, Wangbo and Wang, Kai and Chu, Xiangxiang and Xue, Fuzhao and Wang, Xinchao and You, Yang},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={11737--11746},
year={2022}
}