Zhenyu Jiang, Hanwen Jiang, Yuke Zhu
Project | arxiv | Huggingface Model
2023-09-26
: Initial code release.
We provide a pre-trained version of the Doduo model on the Huggingface model hub. To use this model, run the following Python command:
from transformers import AutoModel
from PIL import Image
model = AutoModel.from_pretrained("stevetod/doduo", trust_remote_code=True)
frame_src = Image.open("path/to/src/frame.png")
frame_dst = Image.open("path/to/dst/frame.png")
flow = model(frame_src, frame_dst)
- Create a conda environment and install the necessary packages.
You can modify the
pytorch
andcuda
version in theenv.yaml
file.
conda env create -f env.yaml
- The data path is stored in
.env
. Runcp .env.example .env
command to create an.env
file. You can modify this file to change your data path.
We use frames from Youtube VOS dataset for training. Download the data from this source.
Note: We use Mask2Former to generate instance masks for visible region discovery. You can find the predicted masks here. After downloading, unzip the file and place it in the Youtube-VOS/train/
directory.
We evaluate point correspondence on DAVIS val set from TAP-Vid dataset. Please download the data from here.
You can download the pretrained model using this link.
We provide two demonstration notebooks for Doduo:
- Visualizing correspondence with any local checkpoint: Make sure you have installed the necessary environment before you initiate this notebook.
- Visualizing correspondence with the Huggingface model: No environment installation is required to initiate this notebook.
You can use the following Python commands to start training the model:
# single GPU debug
python src/train.py model.mixed_precision=True experiment=doduo_train debug=fdr
# multiple GPUs + wandb logging
torchrun --rdzv_backend=c10d --rdzv_endpoint=localhost:0 --nnodes=1 --nproc_per_node=4 src/train.py model.mixed_precision=True experiment=doduo_train logger=wandb_csv
Apply the following Python command, replacing "/path/to/ckpt" with your specific path:
python src/eval.py experiment=doduo_train ckpt_path=/path/to/ckpt
-
Our code is based on this fantastic template Lightning-Hydra-Template.
-
We use Unimatch as our backbone.
@inproceedings{jiang2023doduo,
title={Doduo: Dense Visual Correspondence from Unsupervised Semantic-Aware Flow},
author={Jiang, Zhenyu and Jiang, Hanwen and Zhu, Yuke},
booktitle={arXiv preprint arXiv:2309.15110},
year={2023}
}