VPD

Created by Wenliang Zhao*, Yongming Rao*, Zuyan Liu*, Benlin Liu, Jie Zhou, Jiwen Lu†

This repository contains PyTorch implementation for paper "Unleashing Text-to-Image Diffusion Models for Visual Perception" (ICCV 2023).

VPD (Visual Perception with Pre-trained Diffusion Models) is a framework that leverages the high-level and low-level knowledge of a pre-trained text-to-image diffusion model to downstream visual perception tasks.

[Project Page] [arXiv]

Installation

Clone this repo, and run

git submodule init
git submodule update

Download the checkpoint of stable-diffusion (we use v1-5 by default) and put it in the checkpoints folder. Please also follow the instructions in stable-diffusion to install the required packages.

Semantic Segmentation with VPD

Equipped with a lightweight Semantic FPN and trained for 80K iterations on $512\times512$ crops, our VPD can achieve 54.6 mIoU on ADE20K.

Please check segmentation.md for detailed instructions.

Referring Image Segmentation with VPD

VPD achieves 73.46, 63.93, and 63.12 oIoU on the validation sets of RefCOCO, RefCOCO+, and G-Ref, repectively.

Dataset	P@0.5	P@0.6	P@0.7	P@0.8	P@0.9	OIoU	Mean IoU
RefCOCO	85.52	83.02	78.45	68.53	36.31	73.46	75.67
RefCOCO+	76.69	73.93	69.68	60.98	32.52	63.93	67.98
RefCOCOg	75.16	71.16	65.60	55.04	29.41	63.12	66.42

Please check refer.md for detailed instructions on training and inference.

Depth Estimation with VPD

VPD obtains 0.254 RMSE on NYUv2 depth estimation benchmark, establishing the new state-of-the-art.

	RMSE	d1	d2	d3	REL	log_10
VPD	0.254	0.964	0.995	0.999	0.069	0.030

Please check depth.md for detailed instructions on training and inference.

License

MIT License

Acknowledgements

This code is based on stable-diffusion, mmsegmentation, LAVT, and MIM-Depth-Estimation.

Citation

If you find our work useful in your research, please consider citing:

@article{zhao2023unleashing,
  title={Unleashing Text-to-Image Diffusion Models for Visual Perception},
  author={Zhao, Wenliang and Rao, Yongming and Liu, Zuyan and Liu, Benlin and Zhou, Jie and Lu, Jiwen},
  journal={ICCV},
  year={2023}
}

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
depth		depth
figs		figs
refer		refer
segmentation		segmentation
stable-diffusion @ 21f890f		stable-diffusion @ 21f890f
vpd		vpd
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VPD

Installation

Semantic Segmentation with VPD

Referring Image Segmentation with VPD

Depth Estimation with VPD

License

Acknowledgements

Citation

About

Releases

Packages

Contributors 4

Languages

License

wl-zhao/VPD

Folders and files

Latest commit

History

Repository files navigation

VPD

Installation

Semantic Segmentation with VPD

Referring Image Segmentation with VPD

Depth Estimation with VPD

License

Acknowledgements

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages