Skip to content

Commit

Permalink
release code
Browse files Browse the repository at this point in the history
  • Loading branch information
mayuema committed Apr 7, 2023
1 parent 9334da8 commit fbafcb2
Show file tree
Hide file tree
Showing 18 changed files with 3,309 additions and 13 deletions.
38 changes: 38 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@

/checkpoints

/data

/figs

/log

*.log

*.pth

*.jpg

*.og

/logs

/stable_diffusion.egg-info

/__pycache__

/output

*.pyc

/models

*.ckpt

/src

/newsd_weight

/data

/others
83 changes: 70 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
</table >

## 💃💃💃 Abstract
<b>TL;DR: We tune 2D stable-diffusion to generate the character videos from pose and text description.</b>
<b>TL;DR: We tune the text-to-image model (e.g., stable diffusion) to generate the character videos from pose and text description.</b>

<details><summary>CLICK for full abstract</summary>

Expand All @@ -38,24 +38,81 @@

## 🕺🕺🕺 Changelog
<!-- A new option store all the attentions in hard disk, which require less ram. -->
- 2023.04.06 The `code` and `huggingface demo` will comming tomorrow!
- 2023.04.06 Release `code`, `config` and `checkpoints`!
- 2023.04.03 Release Paper and Project page!

## Todo
## 🎤🎤🎤 Todo

- [ ] Release the code, config and checkpoints for teaser
- [ ] Memory and runtime profiling
- [ ] Hands-on guidance of hyperparameters tuning
- [X] Release the code, config and checkpoints for teaser
- [ ] Hugging face gradio demo: in progress
- [ ] Colab
- [ ] Release configs for other result and in-the-wild dataset
- [ ] hugging-face: inprogress
- [ ] Release more application
- [ ] Release more applications

## 🍻🍻🍻 Setup Environment
Our method is trained using cuda11, accelerator and xformers on 8 A100.
```
conda create -n fupose python=3.8
conda activate fupose
pip install -r requirements.txt
```

`xformers` is recommended for A100 GPU to save memory and running time.

<details><summary>Click for xformers installation </summary>

We find its installation not stable. You may try the following wheel:

```bash
wget https://github.com/ShivamShrirao/xformers-wheels/releases/download/4c06c79/xformers-0.0.15.dev0+4c06c79.d20221201-cp38-cp38-linux_x86_64.whl
pip install xformers-0.0.15.dev0+4c06c79.d20221201-cp38-cp38-linux_x86_64.whl
```
</details>

Our environment is similar to Tune-A-video ([official](https://github.com/showlab/Tune-A-Video), [unofficial](https://github.com/bryandlee/Tune-A-Video)). You may check them for more details.

## 💃💃💃 Training
We fix the bug in Tune-a-video and finetune stable diffusion-1.4 on 8 A100.
To fine-tune the text-to-image diffusion models for text-to-video generation, run this command:

```bash
TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch \
--multi_gpu --num_processes=8 --gpu_ids '0,1,2,3,4,5,6,7' \
train_followyourpose.py \
--config="configs/pose_train.yaml"
```

## 🕺🕺🕺 Inference
Once the training is done, run inference:

```bash
TORCH_DISTRIBUTED_DEBUG=DETAIL accelerate launch \
--gpu_ids '0' \
txt2video.py \
--config="configs/pose_sample.yaml"
```
## 💃💃💃 Weight
[Stable Diffusion] [Stable Diffusion](https://arxiv.org/abs/2112.10752) is a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. The pre-trained Stable Diffusion models can be downloaded from Hugging Face (e.g., [Stable Diffusion v1-4](https://huggingface.co/CompVis/stable-diffusion-v1-4))


[FollowYourPose] We also provide our pretrained checkpoints in [Huggingface](https://huggingface.co/YueMafighting/FollowYourPose_v1/tree/main). you could download them and put them into `checkpoints` folder to inference our models.


```bash
FollowYourPose
├── checkpoints
│ ├── followyourpose_checkpoint-1000
│ │ ├──...
│ ├── stable-diffusion-v1-4
│ │ ├──...
│ └── pose_encoder.pth
```


## 💃💃💃 Results with Stable Diffusion
We show results regarding various pose sequences and text prompts.
## 🕺🕺🕺 Results
We show our results regarding various pose sequences and text prompts.

Note mp4 and gif files in this GitHub page are compressed.
Note mp4 and gif files in this github page are compressed.
Please check our [Project Page](https://follow-your-pose.github.io/) for mp4 files of original video results.
<table class="center">

Expand Down Expand Up @@ -251,7 +308,7 @@ Please check our [Project Page](https://follow-your-pose.github.io/) for mp4 fil

## 👯👯👯 Acknowledgements

This repository borrows heavily from [Tune-A-Video](https://github.com/showlab/Tune-A-Video), [FateZero](https://github.com/ChenyangQiQi/FateZero) and [prompt-to-prompt](https://github.com/google/prompt-to-prompt/). thanks the authors for sharing their code and models.
This repository borrows heavily from [Tune-A-Video](https://github.com/showlab/Tune-A-Video). thanks the authors for sharing their code and models.

## 🕺🕺🕺 Maintenance

Expand Down
29 changes: 29 additions & 0 deletions configs/pose_sample.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
pretrained_model_path: "./checkpoints/stable-diffusion-v1-4"
output_dir: "output"


validation_data:
prompts:
- "Iron man on the beach"
- "Stormtrooper on the sea"
- "Astronaut on the beach"
video_length: 32
width: 512
height: 512
num_inference_steps: 50
guidance_scale: 12.5
use_inv_latent: False
num_inv_steps: 50
dataset_set: "val"


train_batch_size: 1
validation_steps: 100

resume_from_checkpoint: ./checkpoints/followyourpose_checkpoint-1000


seed: 33
mixed_precision: 'no'
gradient_checkpointing: False
enable_xformers_memory_efficient_attention: True
47 changes: 47 additions & 0 deletions configs/pose_train.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
pretrained_model_path: "./checkpoints/stable-diffusion-v1-4"
output_dir: "output"

train_data:
video_path: "no path"
prompt: "None"
n_sample_frames: 12
width: 512
height: 512
sample_start_idx: 0
sample_frame_rate: 4
dataset_set: "train"

validation_data:
prompts:
- "A Iron man on the beach"
- "A Spider man on the snow"
- "A Superman on the street"
- "A boy on the forest"
video_length: 24
width: 512
height: 512
num_inference_steps: 50
guidance_scale: 12.5
use_inv_latent: False
num_inv_steps: 50
dataset_set: "val"


learning_rate: 3e-5
train_batch_size: 1
max_train_steps: 5000
checkpointing_steps: 1000
validation_steps: 100
trainable_modules:
- "attn1.to_q"
- "attn2.to_q"
- "attn_temp"
- "conv_temporal"
skeleton_path: './pose_example/vis_kun_pose2.mov'


seed: 33
mixed_precision: 'no'
use_8bit_adam: False
gradient_checkpointing: False
enable_xformers_memory_efficient_attention: True
Loading

0 comments on commit fbafcb2

Please sign in to comment.