Implementation code:Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models, accepted at International Conference on Learning Representations (ICLR) 2024.
[Note!!!]: We have released a simplified version of PCDMs, using only stage 2, and trained with data from TikTok and DeepFashion to test the model's generalization capability. However, due to limited computational power and data, the results are not very stable. Therefore, this is an experimental version. The weights can be obtained from Google drive.
You can directly download our test results from Google Drive: (1) PCDMs vs SOTA (2) PCDMs Results.
The PCDMs vs SOTA compares our method with several state-of-the-art methods e.g. ADGAN, PISE, GFLA, DPTN, CASD, NTED, PIDM. Each row contains target_pose, source_image, ground_truth, ADGAN, PISE, GFLA, DPTN, CASD, NTED, PIDM, and PCDMs (ours) respectively.
Third-party Usage: ComfyUI_PCDMs
Download dwpose weights (dw-ll_ucoco_384.pth
, yolox_l_8x8_300e_coco_20211126_140236-d3bd2b23.pth
) following this.
# install diffusers & pose extractor
pip install diffusers==0.24.0
pip install controlnet-aux==0.0.7
pip install transformers==4.32.1
pip install accelerate==0.24.1
# install DWPose which is dependent on MMDetection, MMCV and MMPose
pip install -U openmim
mim install mmengine
mim install "mmcv>=2.0.1"
mim install "mmdet>=3.1.0"
mim install "mmpose>=1.1.0"
# clone code
git clone https://github.com/tencent-ailab/PCDMs.git
# download the models
cd PCDMs
mv {weights} ./PCDMs_ckpt.pt
# then you can use the notebook
{pcdms_demo.ipynb}
This link contains processed and prepared data that is ready for use. The data has been processed in the following ways:
• Rename image
• Split the train/test set
• keypoints extracted with Openpose
The folder structure of dataset should be as follows:
Deepfashion/
├── all_data_png # including train and test images
│ ├── img1.png
│ ├── ...
│ ├── img52712.png
├── train_lst_256_png # including train images of 256 size
│ ├── img1.png
│ ├── ...
│ ├── img48674.png
├── train_lst_512_png # including train images of 512 size
│ ├── img1.png
│ ├── ...
│ ├── img48674.png
├── test_lst_256_png # including test images of 256 size
│ ├── img1.png
│ ├── ...
│ ├── img4038.png
├── test_lst_512_png # including test images of 512 size
│ ├── img1.png
│ ├── ...
│ ├── img4038.png
├── normalized_pose_txt.zip # including pose coordinate of train and test set
│ ├── pose_coordinate1.txt
│ ├── ...
│ ├── pose_coordinate40160.txt
├── train_data.json
├── test_data.json
Download img_highres.zip
of the DeepFashion Dataset from In-shop Clothes Retrieval Benchmark.
Unzip img_highres.zip
. You will need to ask for password from the dataset maintainers.
We provide 3 stage checkpoints available here.
- train/test stage1-prior
sh run_stage1.sh & sh run_test_stage1.sh
- train/test stage2-inpaint
sh run_stage2.sh & sh run_test_stage2.sh
- train/test stage3-refined
sh run_stage3.sh & sh run_test_stage3.sh
If this work is useful to you, please consider citing our paper:
@inproceedings{shenadvancing,
title={Advancing Pose-Guided Image Synthesis with Progressive Conditional Diffusion Models},
author={Shen, Fei and Ye, Hu and Zhang, Jun and Wang, Cong and Han, Xiao and Wei, Yang},
booktitle={The Twelfth International Conference on Learning Representations}
}
If you have any questions, please feel free to contact with me at shenfei140721@126.com.