Skip to content

Commit 9f34f7c

Browse files
bubbliiiinghkunzhe
andauthored
Update to V5.1 (#179)
* Update Flow * Update Flow * Update Flow * add image recaptioning * Fix bug in t2v * update train_reward_lora.py * update reward training * Update V5.1 and mix multi text_encoders to one pipeline * Update V5.1 training Code * Update ComfyUI * Update Comment * Delete files * update reward training * Update Readme * fix extract frames in compute_semantic_consistency * Update Readme && Remove to in prediction * Update Demo * Update Readme * Update ui * support vae gradient checkpointing in reward training * Update Training Readme --------- Co-authored-by: hkunzhe <huangkunzhe.hkz@alibaba-inc.com>
1 parent 78415be commit 9f34f7c

File tree

75 files changed

+10533
-5572
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+10533
-5572
lines changed

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ _*
88
__pycache__/
99
*.py[cod]
1010
*$py.class
11+
scripts_demo*
1112

1213
# C extensions
1314
*.so

README.md

100644100755
+174-69
Large diffs are not rendered by default.

README_ja-JP.md

100644100755
+169-66
Large diffs are not rendered by default.

README_zh-CN.md

100644100755
+170-65
Large diffs are not rendered by default.

app.py

+8-4
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,11 @@
1919
#
2020
# "sequential_cpu_offload" means that each layer of the model will be moved to the CPU after use,
2121
# resulting in slower speeds but saving a large amount of GPU memory.
22-
GPU_memory_mode = "model_cpu_offload"
22+
#
23+
# EasyAnimateV1, V2 and V3 support "model_cpu_offload" "sequential_cpu_offload"
24+
# EasyAnimateV4, V5 support "model_cpu_offload" "model_cpu_offload_and_qfloat8" "sequential_cpu_offload"
25+
# EasyAnimateV5.1 support "model_cpu_offload" "model_cpu_offload_and_qfloat8"
26+
GPU_memory_mode = "model_cpu_offload_and_qfloat8"
2327
# Use torch.float16 if GPU does not support torch.bfloat16
2428
# ome graphics cards, such as v100, 2080ti, do not support torch.bfloat16
2529
weight_dtype = torch.bfloat16
@@ -29,11 +33,11 @@
2933
server_port = 7860
3034

3135
# Params below is used when ui_mode = "modelscope"
32-
edition = "v5"
36+
edition = "v5.1"
3337
# Config
34-
config_path = "config/easyanimate_video_v5_magvit_multi_text_encoder.yaml"
38+
config_path = "config/easyanimate_video_v5.1_magvit_qwen.yaml"
3539
# Model path of the pretrained model
36-
model_name = "models/Diffusion_Transformer/EasyAnimateV5-12b-zh-InP"
40+
model_name = "models/Diffusion_Transformer/EasyAnimateV5.1-12b-zh-InP"
3741
# "Inpaint" or "Control"
3842
model_type = "Inpaint"
3943
# Save dir

asset/0a3b5fb184936a83.txt

+175
Large diffs are not rendered by default.

asset/1.mp4

456 KB
Binary file not shown.

asset/pose.mp4

282 KB
Binary file not shown.

comfyui/README.md

+69-31
Original file line numberDiff line numberDiff line change
@@ -6,16 +6,16 @@ Easily use EasyAnimate inside ComfyUI!
66
[![Modelscope Studio](https://img.shields.io/badge/Modelscope-Studio-blue)](https://modelscope.cn/studios/PAI/EasyAnimate/summary)
77
[![Hugging Face Spaces](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-yellow)](https://huggingface.co/spaces/alibaba-pai/EasyAnimate)
88

9-
- [Installation](#1-installation)
9+
English | [简体中文](./README_zh-CN.md)
10+
11+
- [Installation](#installation)
1012
- [Node types](#node-types)
1113
- [Example workflows](#example-workflows)
12-
- [Image to video](#image-to-video)
13-
- [Image to video generation (high FPS w/ frame interpolation)](#image-to-video-generation-high-fps-w-frame-interpolation)
1414

15-
## 1. Installation
15+
## Installation
1616

1717
### Option 1: Install via ComfyUI Manager
18-
TBD
18+
![](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/ComfyUI_Manager.jpg)
1919

2020
### Option 2: Install manually
2121
The EasyAnimate repository needs to be placed at `ComfyUI/custom_nodes/EasyAnimate/`.
@@ -28,37 +28,50 @@ git clone https://github.com/aigc-apps/EasyAnimate.git
2828
2929
# Git clone the video outout node
3030
git clone https://github.com/Kosinkadink/ComfyUI-VideoHelperSuite.git
31+
git clone https://github.com/kijai/ComfyUI-KJNodes.git
3132
3233
cd EasyAnimate/
3334
pip install -r comfyui/requirements.txt
3435
```
3536

36-
### 2. Download models into `ComfyUI/models/EasyAnimate/`
37+
### Download models into `ComfyUI/models/EasyAnimate/`
38+
39+
EasyAnimateV5.1:
40+
41+
12B:
42+
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
43+
|--|--|--|--|--|--|
44+
| EasyAnimateV5.1-12b-zh-InP | EasyAnimateV5.1 | 39 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-12b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
45+
| EasyAnimateV5.1-12b-zh-Control | EasyAnimateV5.1 | 39 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-Control) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-12b-zh-Control) | Official video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, and trajectory control. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
46+
| EasyAnimateV5.1-12b-zh-Control-Camera | EasyAnimateV5.1 | 39 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh-Control-Camera) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-12b-zh-Control-Camera) | Official video camera control weights, supporting direction generation control by inputting camera motion trajectories. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
47+
| EasyAnimateV5.1-12b-zh | EasyAnimateV5.1 | 39 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5.1-12b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5.1-12b-zh) | Official text-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports for multilingual prediction. |
3748

38-
EasyAnimateV5:
49+
<details>
50+
<summary>(Obsolete) EasyAnimateV5:</summary>
3951

4052
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
4153
|--|--|--|--|--|--|
4254
| EasyAnimateV5-12b-zh-InP | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-InP) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-InP) | Official image-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
4355
| EasyAnimateV5-12b-zh-Control | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh-Control) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh-Control) | Official video control weights, supporting various control conditions such as Canny, Depth, Pose, MLSD, etc. Supports video prediction at multiple resolutions (512, 768, 1024) and is trained with 49 frames at 8 frames per second. Bilingual prediction in Chinese and English is supported. |
4456
| EasyAnimateV5-12b-zh | EasyAnimateV5 | 34 GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV5-12b-zh) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV5-12b-zh) | Official text-to-video weights. Supports video prediction at multiple resolutions (512, 768, 1024), trained with 49 frames at 8 frames per second, and supports bilingual prediction in Chinese and English. |
57+
</details>
4558

4659
<details>
4760
<summary>(Obsolete) EasyAnimateV4:</summary>
4861

49-
| Name | Type | Storage Space | Url | Hugging Face | Description |
62+
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
5063
|--|--|--|--|--|--|
51-
| EasyAnimateV4-XL-2-InP.tar.gz | EasyAnimateV4 | Before extraction: 8.9 GB \/ After extraction: 14.0 GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV4-XL-2-InP.tar.gz) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 144 frames at a rate of 24 frames per second. |
64+
| EasyAnimateV4-XL-2-InP | EasyAnimateV4 | Before extraction: 8.9 GB \/ After extraction: 14.0 GB |[🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV4-XL-2-InP)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV4-XL-2-InP)| | Our official graph-generated video model is capable of predicting videos at multiple resolutions (512, 768, 1024, 1280) and has been trained on 144 frames at a rate of 24 frames per second. |
5265
</details>
5366

5467
<details>
5568
<summary>(Obsolete) EasyAnimateV3:</summary>
5669

57-
| Name | Type | Storage Space | Url | Hugging Face | Description |
70+
| Name | Type | Storage Space | Hugging Face | Model Scope | Description |
5871
|--|--|--|--|--|--|
59-
| EasyAnimateV3-XL-2-InP-512x512.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-512x512.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512) | EasyAnimateV3 official weights for 512x512 text and image to video resolution. Training with 144 frames and fps 24 |
60-
| EasyAnimateV3-XL-2-InP-768x768.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-768x768.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | EasyAnimateV3 official weights for 768x768 text and image to video resolution. Training with 144 frames and fps 24 |
61-
| EasyAnimateV3-XL-2-InP-960x960.tar | EasyAnimateV3 | 18.2GB | [Download](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/Diffusion_Transformer/EasyAnimateV3-XL-2-InP-960x960.tar) | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | EasyAnimateV3 official weights for 960x960 text and image to video resolution. Training with 144 frames and fps 24 |
72+
| EasyAnimateV3-XL-2-InP-512x512 | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-512x512)| [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-512x512) | EasyAnimateV3 official weights for 512x512 text and image to video resolution. Training with 144 frames and fps 24 |
73+
| EasyAnimateV3-XL-2-InP-768x768 | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-768x768) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-768x768) | EasyAnimateV3 official weights for 768x768 text and image to video resolution. Training with 144 frames and fps 24 |
74+
| EasyAnimateV3-XL-2-InP-960x960 | EasyAnimateV3 | 18.2GB | [🤗Link](https://huggingface.co/alibaba-pai/EasyAnimateV3-XL-2-InP-960x960) | [😄Link](https://modelscope.cn/models/PAI/EasyAnimateV3-XL-2-InP-960x960) | EasyAnimateV3 official weights for 960x960 text and image to video resolution. Training with 144 frames and fps 24 |
6275
</details>
6376

6477
## Node types
@@ -75,27 +88,52 @@ EasyAnimateV5:
7588

7689
## Example workflows
7790

78-
### Video to video generation
79-
Our ui is shown as follow, this is the [download link](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5_workflow_v2v.json) of the json:
80-
![workflow graph](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5_workflow_v2v.jpg)
91+
### Text to Video Generation
92+
Our user interface is shown as follows, this is the [json](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_t2v.json):
93+
94+
![Workflow Diagram](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_t2v.jpg)
95+
96+
### Image to Video Generation
97+
Our user interface is shown as follows, this is the [json](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_i2v.json):
98+
99+
![Workflow Diagram](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_i2v.jpg)
100+
101+
You can run a demo using the following photo:
102+
103+
![Demo Image](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/firework.png)
104+
105+
### Video to Video Generation
106+
Our user interface is shown as follows, this is the [json](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_v2v.json):
107+
108+
![Workflow Diagram](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_v2v.jpg)
109+
110+
You can run a demo using the following video:
111+
112+
[Demo Video](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/play_guitar.mp4)
113+
114+
### Camera Control Video Generation
115+
Our user interface is shown as follows, this is the [json](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_control_camera.json):
116+
117+
![Workflow Diagram](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_control_camera.jpg)
118+
119+
You can run a demo using the following photo:
120+
121+
![Demo Image](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/firework.png)
122+
123+
### Trajectory Control Video Generation
124+
Our user interface is shown as follows, this is the [json](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_control_trajectory.json):
125+
126+
![Workflow Diagram](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_control_trajectory.jpg)
81127

82-
You can run the demo using following video:
83-
[demo video](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/play_guitar.mp4)
128+
You can run a demo using the following photo:
84129

85-
### Control video generation
86-
Our ui is shown as follow, this is the [download link](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5_workflow_v2v_control.json) of the json:
87-
![workflow graph](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5_workflow_v2v_control.jpg)
130+
![Demo Image](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/dog.png)
88131

89-
You can run the demo using following video:
90-
[demo video](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1.1/pose.mp4)
132+
### Control Video Generation
133+
Our user interface is shown as follows, this is the [json](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5.1_workflow_v2v_control.json):
91134

92-
### Image to video generation
93-
Our ui is shown as follow, this is the [download link](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5_workflow_i2v.json) of the json:
94-
![workflow graph](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5_workflow_i2v.jpg)
135+
![Workflow Diagram](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5.1/easyanimatev5.1_workflow_v2v_control.jpg)
95136

96-
You can run the demo using following photo:
97-
![demo image](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1/firework.png)
137+
You can run a demo using the following video:
98138

99-
### Text to video generation
100-
Our ui is shown as follow, this is the [download link](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5_workflow_t2v.json) of the json:
101-
![workflow graph](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/easyanimate/asset/v5/easyanimatev5_workflow_t2v.jpg)
139+
[Demo Video](https://pai-aigc-photog.oss-cn-hangzhou.aliyuncs.com/cogvideox_fun/asset/v1.1/pose.mp4)

0 commit comments

Comments
 (0)