This is an offical PyTorch implementation of
Accelerating Video Object Segmentation with Compressed Video. CVPR 2022.
[arXiv] [Project Page]
Kai Xu, Angela Yao
Computer Vision and Machine Learning group, NUS.
Prepare Conda Environment: (We test the code for python=3.10 and pytorch=1.11. Similar versions will also work.)
conda create -n CoVOS python
conda activate CoVOS
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
pip install tqdm tabulate opencv-python easydict ninja scikit-image scikit-video
# Install CUDA motion vector warping function.
python setup.py build_ext --inplace install
Prepare HEVC feature decoder: (Here are two options.)
- Compile from source code (openHEVC_feature_decoder) and update
{install_path}/usr/local/bin/hevc
tohevc_feature_decoder_path
in path_config.py .
git clone https://github.com/kai422/openHEVC_feature_decoder.git
cd openHEVC_feature_decoder
git checkout Interface_MV_Residual
# If yasm package is not installed, use the following command.
sudo apt-get install -y yasm
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=RELEASE ..
make -j9
make DESTDIR={install_path} install
- Use pre-compiled binary files for ubuntu 18.04 at decoder/bin/hevc. You don't need to update the path.
Download Data:
DAVIS: Download 480p and Full-Resolution data and put them into the same folder. After unzipping, the structure of the directory should be:
{data_path}/
├──DAVIS/
│ ├──Annotations
│ │ └── ...
│ ├──ImageSets
│ │ └── ...
│ └──JPEGImages
│ ├──480p
│ └──Full-Resolution
YouTube-VOS: Download YouTubeVOS 2018. After unzipping, the structure of the directory should be:
{data_path}/
├──YouTube-VOS/
│ ├──train/
│ ├──train_all_frames/
│ ├──valid/
│ └──valid_all_frames/
Some video frame indexes do not start from 0, so we need to rearrange the snippets.
bash scripts/snippets_rearrange.sh
Update data_path
in path_config.py.
Encode Videos:
Encode raw image sequences into HEVC videos by
# to reproduce, use FFmpeg 3.4.8-0ubuntu0.2 (the default version for ubuntu 18.04)
bash scripts/data/encode_video_davis.sh
bash scripts/encode_video_ytvos.sh
Encoded videos will be stored at {data_path}/DAVIS/HEVCVideos
and {data_path}/YouTube-VOS/HEVCVideos
.
Alternatively, HEVC-encoded video could be downloaded from Google Drive.
Download pretrained models for base network:
- FRTM-VOS: sh model_zoo/FRTM/weights/download_weights.sh
- STM: Download it from STM repository and put it at model_zoo/STM/STM_weights.pth.
- MiVOS: Download s012 model from MiVOS repository and put it at model_zoo/MiVOS/saves/topk_stm.pth. Download s012 model from STCN repository and put it at model_zoo/STCN/saves/stcn.pth.
CoVOS pretrained models are already included in the uploaded github repository: weights/covos_light_encoder.pth and weights/covos_propagator.pth.
You can download pre-computed results from Google Drive.
Commands:
DAVIS 16 Val | J | F | J&F | FPS |
---|---|---|---|---|
STM | 88.7 | 89.9 | 89.3 | 14.9 |
STM+CoVOS | 87.0 | 87.3 | 87.2 | 31.5 |
# DAVIS16, base model: stm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2016_stm.sh
RESULT_PATH=results/covos_stm/dv2016 DSET=dv2016val python evaluate_from_folder.py
DAVIS 16 Val | J | F | J&F | FPS |
---|---|---|---|---|
FRTM-VOS | - | - | 83.5 | 21.9 |
FRTM-VOS+CoVOS | 82.3 | 82.2 | 82.3 | 28.6 |
# DAVIS16, base model: frtm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2016_frtm.sh
RESULT_PATH=results/covos_frtm/dv2016 DSET=dv2016val python evaluate_from_folder.py
DAVIS 16 Val | J | F | J&F | FPS |
---|---|---|---|---|
MiVOS | 89.7 | 92.4 | 91.0 | 16.9 |
MiVOS+CoVOS | 89.0 | 89.8 | 89.4 | 36.8 |
# DAVIS16, base model: mivos
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2016_mivos.sh
RESULT_PATH=results/covos_mivos/dv2016 DSET=dv2016val python evaluate_from_folder.py
DAVIS 16 Val | J | F | J&F | FPS |
---|---|---|---|---|
STCN | 90.4 | 93.0 | 91.7 | 26.9 |
STCN+CoVOS | 88.5 | 89.6 | 89.1 | 42.7 |
# DAVIS16, base model: stcn
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2016_stcn.sh
RESULT_PATH=results/covos_stcn/dv2016 DSET=dv2016val python evaluate_from_folder.py
DAVIS 17 Val | J | F | J&F | FPS |
---|---|---|---|---|
STM | 79.2 | 84.3 | 81.8 | 10.6 |
STM+CoVOS | 78.3 | 82.7 | 80.5 | 23.8 |
# DAVIS17, base model: stm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2017_stm.sh
RESULT_PATH=results/covos_stm/dv2017 DSET=dv2017val python evaluate_from_folder.py
DAVIS 17 Val | J | F | J&F | FPS |
---|---|---|---|---|
FRTM-VOS | - | - | 76.7 | 14.1 |
FRTM-VOS+CoVOS | 69.7 | 75.2 | 72.5 | 20.6 |
# DAVIS17, base model: frtm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2017_frtm.sh
RESULT_PATH=results/covos_frtm/dv2017 DSET=dv2017val python evaluate_from_folder.py
DAVIS 17 Val | J | F | J&F | FPS |
---|---|---|---|---|
MiVOS | 81.8 | 87.4 | 84.5 | 11.2 |
MiVOS+CoVOS | 79.7 | 84.6 | 82.2 | 25.5 |
# DAVIS17, base model: mivos
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2017_mivos.sh
RESULT_PATH=results/covos_mivos/dv2017 DSET=dv2017val python evaluate_from_folder.py
DAVIS 17 Val | J | F | J&F | FPS |
---|---|---|---|---|
STCN | 82.0 | 88.6 | 85.3 | 20.2 |
STCN+CoVOS | 79.7 | 85.1 | 82.4 | 33.7 |
# DAVIS17, base model: stcn
scripts/exps/covos_dv2017_stcn.sh
RESULT_PATH=results/covos_stcn/dv2017 DSET=dv2017val python evaluate_from_folder.py
YT-VOS 18 Val | G | J_s | F_s | J_u | F_u | FPS |
---|---|---|---|---|---|---|
FRTM-VOS | 72.1 | 72.3 | 76.2 | 65.9 | 74.1 | 7.7 |
FRTM-VOS+CoVOS | 65.6 | 68.0 | 71.0 | 58.2 | 65.4 | 25.3 |
#Youtube-VOS 2018, base model: frtm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_yt2018_frtm.sh
YT-VOS 18 Val | G | J_s | F_s | J_u | F_u | FPS |
---|---|---|---|---|---|---|
MiVOS | 82.6 | 81.1 | 85.6 | 77.7 | 86.2 | 13 |
MiVOS+CoVOS | 79.3 | 78.9 | 83.0 | 73.5 | 81.7 | 45.9 |
# Youtube-VOS 2018, base model: mivos
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_yt2018_mivos.sh
YT-VOS 18 Val | G | J_s | F_s | J_u | F_u | FPS |
---|---|---|---|---|---|---|
STCN | 84.3 | 83.2 | 87.9 | 79.0 | 87.3 | 16.8 |
STCN+CoVOS | 79.0 | 79.4 | 83.6 | 72.6 | 80.4 | 57.9 |
# Youtube-VOS 2018, base model: stcn
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_yt2018_stcn.sh
This project is released under the GPL-3.0 License. We refer to codes from MiVOS
, FRTM-VOS
, and DAVIS
.
@inproceedings{xu2022covos,
title={Accelerating Video Object Segmentation with Compressed Video},
author={Kai Xu and Angela Yao},
booktitle={CVPR},
year={2022}
}