VITA: Video Instance Segmentation via Object Token Association (NeurIPS 2022)

Miran Heo^*, Sukjun Hwang^*, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim (*equal contribution)

[arXiv] [BibTeX]

Updates

Jan 20, 2023: Our new online VIS method "GenVIS" is available at here!
Sep 14, 2022: VITA is accepted to NeurIPS 2022!
Aug 15, 2022: Code and pretrained weights are now available! Thanks for your patience :)

Installation

See installation instructions.

Getting Started

We provide a script train_net_vita.py, that is made to train all the configs provided in VITA.

To train a model with "train_net_vita.py" on VIS, first setup the corresponding datasets following Preparing Datasets for VITA.

Then run with COCO pretrained weights in the Model Zoo:

python train_net_vita.py --num-gpus 8 \
  --config-file configs/youtubevis_2019/vita_R50_bs8.yaml \
  MODEL.WEIGHTS vita_r50_coco.pth

To evaluate a model's performance, use

python train_net_vita.py \
  --config-file configs/youtubevis_2019/vita_R50_bs8.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

Model Zoo

Pretrained weights on COCO

Name	R-50	R-101	Swin-L
VITA	model	model	model

YouTubeVIS-2019

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
VITA	R-50	49.8	72.6	54.5	49.4	61.0	model
VITA	Swin-L	63.0	86.9	67.9	56.3	68.1	model

YouTubeVIS-2021

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
VITA	R-50	45.7	67.4	49.5	40.9	53.6	model
VITA	Swin-L	57.5	80.6	61.0	47.7	62.6	model

OVIS

Name	Backbone	AP	AP50	AP75	AR1	AR10	Download
VITA	R-50	19.6	41.2	17.4	11.7	26.0	model
VITA	Swin-L	27.7	51.9	24.9	14.9	33.0	model

License

The majority of VITA is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), and Deformable-DETR(Apache-2.0 License).

Citing VITA

If you use VITA in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{VITA,
  title={VITA: Video Instance Segmentation via Object Token Association},
  author={Heo, Miran and Hwang, Sukjun and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgement

Our code is largely based on Detectron2, IFC, Mask2Former, and Deformable DETR. We are truly grateful for their excellent work.

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
configs		configs
datasets		datasets
demo_vita		demo_vita
mask2former		mask2former
vita		vita
.gitignore		.gitignore
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
convert_coco2ytvis.py		convert_coco2ytvis.py
requirements.txt		requirements.txt
train_net_vita.py		train_net_vita.py
vita_teaser.png		vita_teaser.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VITA: Video Instance Segmentation via Object Token Association (NeurIPS 2022)

Updates

Installation

Getting Started

Model Zoo

Pretrained weights on COCO

YouTubeVIS-2019

YouTubeVIS-2021

OVIS

License

Citing VITA

Acknowledgement

About

Releases

Packages

Contributors 2

Languages

License

sukjunhwang/VITA

Folders and files

Latest commit

History

Repository files navigation

VITA: Video Instance Segmentation via Object Token Association (NeurIPS 2022)

Updates

Installation

Getting Started

Model Zoo

Pretrained weights on COCO

YouTubeVIS-2019

YouTubeVIS-2021

OVIS

License

Citing VITA

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages