Skip to content

Commit

Permalink
[PPDiffusers]fix bugs and release 0.29.0 (PaddlePaddle#742)
Browse files Browse the repository at this point in the history
Co-authored-by: nifeng <nemonameless@qq.com>
  • Loading branch information
westfish and nemonameless authored Oct 18, 2024
1 parent 13fb61b commit d51cf90
Show file tree
Hide file tree
Showing 13 changed files with 206 additions and 141 deletions.
7 changes: 2 additions & 5 deletions ppdiffusers/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@
**PPDiffusers**是一款支持多种模态(如文本图像跨模态、图像、语音)扩散模型(Diffusion Model)训练和推理的国产化工具箱,依托于[**PaddlePaddle**](https://www.paddlepaddle.org.cn/)框架和[**PaddleNLP**](https://github.com/PaddlePaddle/PaddleNLP)自然语言处理开发库。

## News 📢
* 🔥 **2024.10.18 发布 0.29.0 版本,新增图像生成模型[Stable Diffusion 3 (SD3)](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/text_to_image/README_sd3.md),支持DreamBooth训练及高性能推理;SD3、SDXL适配昇腾910B,提供国产计算芯片上的训推能力;DIT支持[高性能推理](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/class_conditional_image_generation/DiT/README.md#23-paddle-inference-%E9%AB%98%E6%80%A7%E8%83%BD%E6%8E%A8%E7%90%86);支持PaddleNLP 3.0 beta版本。**

* 🔥 **2024.07.15 发布 0.24.1 版本,新增[Open-Sora](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/Open-Sora),支持模型训练和推理;全面支持Paddle 3.0。**

* 🔥 **2024.04.17 发布 0.24.0 版本,支持[Sora相关技术](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/sora),支持[DiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT)[SiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT#exploring-flow-and-diffusion-based-generative-models-with-scalable-interpolant-transformers-sit)[UViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_mscoco_uvit)训练推理,新增[NaViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/navit)[MAGVIT-v2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/video_tokenizer/magvit2)模型;
Expand All @@ -38,11 +40,6 @@ Stable Diffusion支持[BF16 O2训练](https://github.com/PaddlePaddle/PaddleMIX/
[LoRA加载升级](#加载HF-LoRA权重),支持加载SDXL的LoRA权重;
[Controlnet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/ppdiffusers/pipelines/controlnet)升级,支持ControlNetImg2Img、ControlNetInpaint、StableDiffusionXLControlNet等。**

* 🔥 **2023.06.20 发布 0.16.1 版本,新增[T2I-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/t2i-adapter),支持训练与推理;ControlNet升级,支持[reference only推理](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#controlnet-reference-only);新增[WebUIStableDiffusionPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#automatic1111-webui-stable-diffusion)
支持通过prompt的方式动态加载lora、textual_inversion权重;
新增[StableDiffusionHiresFixPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#stable-diffusion-with-high-resolution-fixing),支持高分辨率修复;
新增关键点控制生成任务评价指标[COCOeval](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/scripts/cocoeval_keypoints_score)
新增多种模态扩散模型Pipeline,包括视频生成([Text-to-Video-Synth](#文本视频多模)[Text-to-Video-Zero](#文本视频多模))、音频生成([AudioLDM](#文本音频多模)[Spectrogram Diffusion](#音频));新增文图生成模型[IF](#文本图像多模)**



Expand Down
2 changes: 1 addition & 1 deletion ppdiffusers/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.24.1
0.29.0
7 changes: 4 additions & 3 deletions ppdiffusers/deploy/sd3/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ python -m paddle.distributed.launch --gpus 0,1 text_to_image_generation-stable_d
```
## 在 NVIDIA A800-SXM4-80GB 上测试的性能如下:

| Paddle batch parallel | Paddle Single Card | PyTorch | Paddle 动态图 |
| --------------------- | ------------------ | --------- | ------------ |
| 0.86 s | 1.2 s | 1.78 s | 4.202 s |

| Paddle batch parallel | Paddle Single Card | PyTorch | TensorRT | Paddle 动态图 |
| --------------------- | ------------------ | --------- | -------- | ------------ |
| 0.86 s | 1.2 s | 1.78 s | 1.16 s | 4.202 s |​⬤
145 changes: 145 additions & 0 deletions ppdiffusers/deploy/sd3/text_to_image_generation-stable_diffusion_3.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,145 @@
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import paddle
def parse_args():
parser = argparse.ArgumentParser(
description=" Use PaddleMIX to accelerate the Stable Diffusion3 image generation model."
)
parser.add_argument(
"--benchmark",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="if set to True, measure inference performance",
)
parser.add_argument(
"--inference_optimize",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="If set to True, all optimizations except Triton are enabled.",
)
parser.add_argument(
"--inference_optimize_bp",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="If set to True, batch parallel is enabled in DIT and dual-GPU acceleration is used.",
)
parser.add_argument("--height", type=int, default=512, help="Height of the generated image.")
parser.add_argument("--width", type=int, default=512, help="Width of the generated image.")
parser.add_argument("--num-inference-steps", type=int, default=50, help="Number of inference steps.")
parser.add_argument("--dtype", type=str, default="float32", help="Inference data types.")

return parser.parse_args()


args = parse_args()

if args.inference_optimize:
os.environ["INFERENCE_OPTIMIZE"] = "True"
os.environ["INFERENCE_OPTIMIZE_TRITON"] = "True"
if args.inference_optimize_bp:
os.environ["INFERENCE_OPTIMIZE_BP"] = "True"
if args.dtype == "float32":
inference_dtype = paddle.float32
elif args.dtype == "float16":
inference_dtype = paddle.float16


if args.inference_optimize_bp:
from paddle.distributed import fleet
from paddle.distributed.fleet.utils import recompute
import numpy as np
import random
import paddle.distributed as dist
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
model_parallel_size = 2
data_parallel_size = 1
strategy.hybrid_configs = {
"dp_degree": data_parallel_size,
"mp_degree": model_parallel_size,
"pp_degree": 1
}
fleet.init(is_collective=True, strategy=strategy)
hcg = fleet.get_hybrid_communicate_group()
mp_id = hcg.get_model_parallel_rank()
rank_id = dist.get_rank()

import datetime
from ppdiffusers import StableDiffusion3Pipeline


pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
paddle_dtype=inference_dtype,
)

pipe.transformer = paddle.incubate.jit.inference(
pipe.transformer,
save_model_dir="./tmp/sd3",
enable_new_ir=True,
cache_static_model=True,
# V100环境下,需设置exp_enable_use_cutlass=False,
exp_enable_use_cutlass=True,
delete_pass_lists=["add_norm_fuse_pass"],
)

generator = paddle.Generator().manual_seed(42)
prompt = "A cat holding a sign that says hello world"


image = pipe(
prompt, num_inference_steps=args.num_inference_steps, width=args.width, height=args.height, generator=generator
).images[0]

if args.benchmark:
# warmup
for i in range(3):
image = pipe(
prompt,
num_inference_steps=args.num_inference_steps,
width=args.width,
height=args.height,
generator=generator,
).images[0]

repeat_times = 10
sumtime = 0.0
for i in range(repeat_times):
paddle.device.synchronize()
starttime = datetime.datetime.now()
image = pipe(
prompt,
num_inference_steps=args.num_inference_steps,
width=args.width,
height=args.height,
generator=generator,
).images[0]
paddle.device.synchronize()
endtime = datetime.datetime.now()
duringtime = endtime - starttime
duringtime = duringtime.seconds * 1000 + duringtime.microseconds / 1000.0
sumtime += duringtime
print("SD3 end to end time : ", duringtime, "ms")

print("SD3 ave end to end time : ", sumtime / repeat_times, "ms")
cuda_mem_after_used = paddle.device.cuda.max_memory_allocated() / (1024**3)
print(f"Max used CUDA memory : {cuda_mem_after_used:.3f} GiB")

if args.inference_optimize_bp:
if rank_id == 0:
image.save("text_to_image_generation-stable_diffusion_3-result.png")
else:
image.save("text_to_image_generation-stable_diffusion_3-result.png")
2 changes: 2 additions & 0 deletions ppdiffusers/examples/controlnet/requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,5 @@ paddlenlp>=2.7.2
opencv-python
ppdiffusers>=0.24.0
cchardet
gradio==3.16.2
basicsr==1.4.2
2 changes: 1 addition & 1 deletion ppdiffusers/examples/dreambooth/README_sd3.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,7 @@ import paddle
pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers", paddle_dtype=paddle.float16
)
pipeline.load_lora_weights('your-lora-checkpoint')
pipe.load_lora_weights('your-lora-checkpoint')

image = pipe("A picture of a sks dog in a bucket", num_inference_steps=25).images[0]
image.save("sks_dog_dreambooth_lora.png")
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,6 @@
url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"
init_image = load_image(url).resize((512, 512))
prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
images = pipe(prompt=prompt, image=init_image, strength=0.95, guidance_scale=7.5).images[0]
image = pipe(prompt=prompt, image=init_image, strength=0.95, guidance_scale=7.5).images[0]

image.save("image_to_image_text_guided_generation-stable_diffusion_3-result.png")
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
Expand All @@ -11,134 +11,13 @@
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import os
import argparse
import paddle
def parse_args():
parser = argparse.ArgumentParser(
description=" Use PaddleMIX to accelerate the Stable Diffusion3 image generation model."
)
parser.add_argument(
"--benchmark",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="if set to True, measure inference performance",
)
parser.add_argument(
"--inference_optimize",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="If set to True, all optimizations except Triton are enabled.",
)
parser.add_argument(
"--inference_optimize_bp",
type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
default=False,
help="If set to True, batch parallel is enabled in DIT and dual-GPU acceleration is used.",
)
parser.add_argument("--height", type=int, default=512, help="Height of the generated image.")
parser.add_argument("--width", type=int, default=512, help="Width of the generated image.")
parser.add_argument("--num-inference-steps", type=int, default=50, help="Number of inference steps.")
parser.add_argument("--dtype", type=str, default="float32", help="Inference data types.")

return parser.parse_args()


args = parse_args()

if args.inference_optimize:
os.environ["INFERENCE_OPTIMIZE"] = "True"
os.environ["INFERENCE_OPTIMIZE_TRITON"] = "True"
if args.inference_optimize_bp:
os.environ["INFERENCE_OPTIMIZE_BP"] = "True"
if args.dtype == "float32":
inference_dtype = paddle.float32
elif args.dtype == "float16":
inference_dtype = paddle.float16


if args.inference_optimize_bp:
from paddle.distributed import fleet
from paddle.distributed.fleet.utils import recompute
import numpy as np
import random
import paddle.distributed as dist
import paddle.distributed.fleet as fleet
strategy = fleet.DistributedStrategy()
model_parallel_size = 2
data_parallel_size = 1
strategy.hybrid_configs = {
"dp_degree": data_parallel_size,
"mp_degree": model_parallel_size,
"pp_degree": 1
}
fleet.init(is_collective=True, strategy=strategy)
hcg = fleet.get_hybrid_communicate_group()
mp_id = hcg.get_model_parallel_rank()
rank_id = dist.get_rank()

import datetime
import paddle
from ppdiffusers import StableDiffusion3Pipeline


pipe = StableDiffusion3Pipeline.from_pretrained(
"stabilityai/stable-diffusion-3-medium-diffusers",
paddle_dtype=inference_dtype,
"stabilityai/stable-diffusion-3-medium-diffusers", paddle_dtype=paddle.float16
)

pipe.transformer = paddle.incubate.jit.inference(
pipe.transformer,
save_model_dir="./tmp/sd3",
enable_new_ir=True,
cache_static_model=True,
exp_enable_use_cutlass=True,
delete_pass_lists=["add_norm_fuse_pass"],
)

generator = paddle.Generator().manual_seed(42)
prompt = "A cat holding a sign that says hello world"


image = pipe(
prompt, num_inference_steps=args.num_inference_steps, width=args.width, height=args.height, generator=generator
).images[0]

if args.benchmark:
# warmup
for i in range(3):
image = pipe(
prompt,
num_inference_steps=args.num_inference_steps,
width=args.width,
height=args.height,
generator=generator,
).images[0]

repeat_times = 10
sumtime = 0.0
for i in range(repeat_times):
paddle.device.synchronize()
starttime = datetime.datetime.now()
image = pipe(
prompt,
num_inference_steps=args.num_inference_steps,
width=args.width,
height=args.height,
generator=generator,
).images[0]
paddle.device.synchronize()
endtime = datetime.datetime.now()
duringtime = endtime - starttime
duringtime = duringtime.seconds * 1000 + duringtime.microseconds / 1000.0
sumtime += duringtime
print("SD3 end to end time : ", duringtime, "ms")

print("SD3 ave end to end time : ", sumtime / repeat_times, "ms")
cuda_mem_after_used = paddle.device.cuda.max_memory_allocated() / (1024**3)
print(f"Max used CUDA memory : {cuda_mem_after_used:.3f} GiB")

if args.inference_optimize_bp:
if rank_id == 0:
image.save("text_to_image_generation-stable_diffusion_3-result.png")
else:
image.save("text_to_image_generation-stable_diffusion_3-result.png")
image = pipe(prompt, generator=generator).images[0]
image.save("text_to_image_generation-stable_diffusion_3-result.png")
2 changes: 1 addition & 1 deletion ppdiffusers/examples/kandinsky2_2/text_to_image/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ prior_components = {"prior_" + k: v for k,v in pipe_prior.components.items()}
pipe = KandinskyV22CombinedPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", **prior_components)

prompt='A robot pokemon, 4k photo'
images = pipe(prompt=prompt, negative_prompt=negative_prompt).images
images = pipe(prompt=prompt).images
images[0]
```

Expand Down
Loading

0 comments on commit d51cf90

Please sign in to comment.