This repository is the official implementation of Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement (NeurIPS 2020). It is designed for semi-supervised video object segmentation (VOS) task.
[NeurIPS Page] [Paper] [Supplementary]
Paper corrections: Our feature map generated by the encoders has 1024 channels and 1/16 of the original image size.
We built and tested the repository on Python 3.6.9 and Ubuntu 18.04 with one NVIDIA 1080Ti card (11GB Memory). Run on Windows or Mac is possible with minor modifications. An NVIDIA GPU card and CUDA environment are required. To install requirements, run:
pip3 install -r requirements.txt
Install the package torch_scatter by the official instructions. Our version is 2.0.4.
- Download and extract DAVIS17-TrainVal dataset.
- Download the pretrained DAVIS17 checkpoint.
- run:
python3 eval.py --level 1 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/
To reproduce the segmentation scores, you can use the official evaluation tool from DAVIS benchmark.
- Download and extract YouTube-VOS18 dataset.
- Download the pretrained YouTube-VOS18 checkpoint.
- run:
python3 eval.py --level 2 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/ --update-rate 0.05
Attention: Directly submit our results to the YouTube-VOS codalab for evaluation will pollute the leader board. We encourage you to submit your own results.
- Download and extract Long Videos dataset.
- Download the pretrained YouTube-VOS18 checkpoint above.
- run:
python3 eval.py --level 3 --resume /path/to/checkpoint.pth/ --dataset /path/to/dir/ --update-rate 0.05
To reproduce the segmentation scores, you can use the same tool from the DAVIS benchmark.
Prepare your video frames and the first frame annotation following the data structure of the long videos page. You can see the data structure without download it and you only need to provide the first frame annotation.
Run the same parameters as the long videos setting.
--gpu
: GPU id to run (default: 0).--viz
: Enable output overlays along with the estimated masks (default: False).--budget
: The number of features that can be stored in total (default: 300000 for 1080Ti).
By default, the segmentation results will be saved in ./output
.
- Download the following the datasets (COCO is the largest one). You don't have to download all, our pretrain codes skip datasets that don't exist by default.
- Run
unify_pretrain_dataset.py
to convert them into a uniform format (followed DAVIS).
python3 unify_pretrain_dataset.py --name NAME --src /path/to/dataset/dir/ --dst /path/to/output
- MSRA10K:
--name MSRA10K
- ECSSD:
--name ECSSD
- PASCAL-S:
--name PASCAl-s
- PASCAL VOC2012:
--name PASCALVOC2012
- COCO:
--name COCO
. API pycocotools is required.
You may need minor modifications in the dataset path. Descriptions of useful options,
--palette
: Path to the palette image. We provide a template inassets/mask_palette.png
, followed the formats of DAVIS17.--workder
: The parallel threads number to accelerate the procedures (Default: 20).
After the conversion process, you can start pre-training the model:
python3 train.py --level 0 --dataset /path/to/pretrain/ --lr 1e-5 --scheduler-step 3 --total-epoch 12 --log
Pre-training process may takes days to weeks, you can download our checkpoint to save time.
Download the semi-supervised TrainVal 480p from the DAVIS website. Run
python3 train.py --level 1 --new --resume /path/to/PreTrain/checkpoint.pth --dataset /path/to/DAVIS17/ --lr 4e-6 --scheduler-step 200 --total-epoch 1000 --log
Download training set of the YouTube-VOS dataset. Run
python3 train.py --level 2 --new --resume /path/to/PreTrain/checkpoint.pth --dataset /path/to/YouTubeVOS/train --lr 4e-6 --scheduler-step 30 --total-epoch 150 --log
This repository is released for academic use only. If you want to use our codes for commercial products, please contact xinli@cct.lsu.edu in advance. If you use our codes, please cite our paper,
@inproceedings{NEURIPS2020_liangVOS,
author = {Liang, Yongqing and Li, Xin and Jafari, Navid and Chen, Jim},
booktitle = {Advances in Neural Information Processing Systems},
editor = {H. Larochelle and M. Ranzato and R. Hadsell and M. F. Balcan and H. Lin},
pages = {3430--3441},
publisher = {Curran Associates, Inc.},
title = {Video Object Segmentation with Adaptive Feature Bank and Uncertain-Region Refinement},
url = {https://proceedings.neurips.cc/paper/2020/file/234833147b97bb6aed53a8f4f1c7a7d8-Paper.pdf},
volume = {33},
year = {2020}
}
- 2022/04/24 Update the evaluation script for long video benchmark.