[PPDiffusers]fix bugs and release 0.29.0 (PaddlePaddle#742)

Co-authored-by: nifeng <nemonameless@qq.com>
ZhijunLStudio · Oct 18, 2024 · d51cf90 · d51cf90
1 parent 13fb61b
commit d51cf90
Show file tree

Hide file tree

Showing 13 changed files with 206 additions and 141 deletions.
diff --git a/ppdiffusers/README.md b/ppdiffusers/README.md
@@ -20,6 +20,8 @@
 **PPDiffusers**是一款支持多种模态（如文本图像跨模态、图像、语音）扩散模型（Diffusion Model）训练和推理的国产化工具箱，依托于[**PaddlePaddle**](https://www.paddlepaddle.org.cn/)框架和[**PaddleNLP**](https://github.com/PaddlePaddle/PaddleNLP)自然语言处理开发库。
 
 ## News 📢
+* 🔥 **2024.10.18 发布 0.29.0 版本，新增图像生成模型[Stable Diffusion 3 (SD3)](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/text_to_image/README_sd3.md)，支持DreamBooth训练及高性能推理；SD3、SDXL适配昇腾910B，提供国产计算芯片上的训推能力；DIT支持[高性能推理](https://github.com/PaddlePaddle/PaddleMIX/blob/develop/ppdiffusers/examples/class_conditional_image_generation/DiT/README.md#23-paddle-inference-%E9%AB%98%E6%80%A7%E8%83%BD%E6%8E%A8%E7%90%86)；支持PaddleNLP 3.0 beta版本。**
+
 * 🔥 **2024.07.15 发布 0.24.1 版本，新增[Open-Sora](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/Open-Sora)，支持模型训练和推理；全面支持Paddle 3.0。**
 
 * 🔥 **2024.04.17 发布 0.24.0 版本，支持[Sora相关技术](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/sora)，支持[DiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT)、[SiT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/class_conditional_image_generation/DiT#exploring-flow-and-diffusion-based-generative-models-with-scalable-interpolant-transformers-sit)、[UViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/text_to_image_mscoco_uvit)训练推理，新增[NaViT](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/navit)、[MAGVIT-v2](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/video_tokenizer/magvit2)模型；
@@ -38,11 +40,6 @@ Stable Diffusion支持[BF16 O2训练](https://github.com/PaddlePaddle/PaddleMIX/
 [LoRA加载升级](#加载HF-LoRA权重)，支持加载SDXL的LoRA权重；
 [Controlnet](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/ppdiffusers/pipelines/controlnet)升级，支持ControlNetImg2Img、ControlNetInpaint、StableDiffusionXLControlNet等。**
 
-* 🔥 **2023.06.20 发布 0.16.1 版本，新增[T2I-Adapter](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/t2i-adapter)，支持训练与推理；ControlNet升级，支持[reference only推理](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#controlnet-reference-only)；新增[WebUIStableDiffusionPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#automatic1111-webui-stable-diffusion)，
-支持通过prompt的方式动态加载lora、textual_inversion权重；
-新增[StableDiffusionHiresFixPipeline](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/examples/community#stable-diffusion-with-high-resolution-fixing)，支持高分辨率修复；
-新增关键点控制生成任务评价指标[COCOeval](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/ppdiffusers/scripts/cocoeval_keypoints_score)；
-新增多种模态扩散模型Pipeline，包括视频生成（[Text-to-Video-Synth](#文本视频多模)、[Text-to-Video-Zero](#文本视频多模)）、音频生成（[AudioLDM](#文本音频多模)、[Spectrogram Diffusion](#音频)）；新增文图生成模型[IF](#文本图像多模)。**
 
 
 

diff --git a/ppdiffusers/VERSION b/ppdiffusers/VERSION
@@ -1 +1 @@
-0.24.1
+0.29.0
diff --git a/ppdiffusers/deploy/sd3/README.md b/ppdiffusers/deploy/sd3/README.md
@@ -52,6 +52,7 @@ python -m paddle.distributed.launch --gpus 0,1 text_to_image_generation-stable_d
 ```
 ## 在 NVIDIA A800-SXM4-80GB 上测试的性能如下：
 
-| Paddle batch parallel | Paddle Single Card |  PyTorch  | Paddle 动态图 |
-| --------------------- | ------------------ | --------- | ------------ |
-|          0.86 s       |        1.2 s       |   1.78 s  |    4.202 s   |
+
+| Paddle batch parallel | Paddle Single Card |  PyTorch  | TensorRT | Paddle 动态图 |
+| --------------------- | ------------------ | --------- | -------- | ------------ |
+|          0.86 s       |        1.2 s       |   1.78 s  |  1.16 s  |    4.202 s   |⬤
diff --git a/ppdiffusers/deploy/sd3/text_to_image_generation-stable_diffusion_3.py b/ppdiffusers/deploy/sd3/text_to_image_generation-stable_diffusion_3.py
@@ -0,0 +1,145 @@
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+import os
+import argparse
+import paddle
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description=" Use PaddleMIX to accelerate the Stable Diffusion3 image generation model."
+    )
+    parser.add_argument(
+        "--benchmark",
+        type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
+        default=False,
+        help="if set to True, measure inference performance",
+    )
+    parser.add_argument(
+        "--inference_optimize",
+        type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
+        default=False,
+        help="If set to True, all optimizations except Triton are enabled.",
+    )
+    parser.add_argument(
+        "--inference_optimize_bp",
+        type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
+        default=False,
+        help="If set to True, batch parallel is enabled in DIT and dual-GPU acceleration is used.",
+    )
+    parser.add_argument("--height", type=int, default=512, help="Height of the generated image.")
+    parser.add_argument("--width", type=int, default=512, help="Width of the generated image.")
+    parser.add_argument("--num-inference-steps", type=int, default=50, help="Number of inference steps.")
+    parser.add_argument("--dtype", type=str, default="float32", help="Inference data types.")
+
+    return parser.parse_args()
+
+
+args = parse_args()
+
+if args.inference_optimize:
+    os.environ["INFERENCE_OPTIMIZE"] = "True"
+    os.environ["INFERENCE_OPTIMIZE_TRITON"] = "True"
+if args.inference_optimize_bp:
+    os.environ["INFERENCE_OPTIMIZE_BP"] = "True"
+if args.dtype == "float32":
+    inference_dtype = paddle.float32
+elif args.dtype == "float16":
+    inference_dtype = paddle.float16
+
+
+if args.inference_optimize_bp:
+    from paddle.distributed import fleet
+    from paddle.distributed.fleet.utils import recompute
+    import numpy as np
+    import random
+    import paddle.distributed as dist
+    import paddle.distributed.fleet as fleet
+    strategy = fleet.DistributedStrategy()
+    model_parallel_size = 2
+    data_parallel_size = 1
+    strategy.hybrid_configs = {
+    "dp_degree": data_parallel_size,
+    "mp_degree": model_parallel_size,
+    "pp_degree": 1
+    }
+    fleet.init(is_collective=True, strategy=strategy)
+    hcg = fleet.get_hybrid_communicate_group()
+    mp_id = hcg.get_model_parallel_rank()
+    rank_id = dist.get_rank()
+
+import datetime
+from ppdiffusers import StableDiffusion3Pipeline
+
+
+pipe = StableDiffusion3Pipeline.from_pretrained(
+    "stabilityai/stable-diffusion-3-medium-diffusers",
+    paddle_dtype=inference_dtype,
+)
+
+pipe.transformer = paddle.incubate.jit.inference(
+    pipe.transformer,
+    save_model_dir="./tmp/sd3",
+    enable_new_ir=True,
+    cache_static_model=True,
+    # V100环境下，需设置exp_enable_use_cutlass=False,
+    exp_enable_use_cutlass=True,
+    delete_pass_lists=["add_norm_fuse_pass"],
+)
+
+generator = paddle.Generator().manual_seed(42)
+prompt = "A cat holding a sign that says hello world"
+
+
+image = pipe(
+    prompt, num_inference_steps=args.num_inference_steps, width=args.width, height=args.height, generator=generator
+).images[0]
+
+if args.benchmark:
+    # warmup
+    for i in range(3):
+        image = pipe(
+            prompt,
+            num_inference_steps=args.num_inference_steps,
+            width=args.width,
+            height=args.height,
+            generator=generator,
+        ).images[0]
+
+    repeat_times = 10
+    sumtime = 0.0
+    for i in range(repeat_times):
+        paddle.device.synchronize()
+        starttime = datetime.datetime.now()
+        image = pipe(
+            prompt,
+            num_inference_steps=args.num_inference_steps,
+            width=args.width,
+            height=args.height,
+            generator=generator,
+        ).images[0]
+        paddle.device.synchronize()
+        endtime = datetime.datetime.now()
+        duringtime = endtime - starttime
+        duringtime = duringtime.seconds * 1000 + duringtime.microseconds / 1000.0
+        sumtime += duringtime
+        print("SD3 end to end time : ", duringtime, "ms")
+
+    print("SD3 ave end to end time : ", sumtime / repeat_times, "ms")
+    cuda_mem_after_used = paddle.device.cuda.max_memory_allocated() / (1024**3)
+    print(f"Max used CUDA memory : {cuda_mem_after_used:.3f} GiB")
+
+if args.inference_optimize_bp:
+    if rank_id == 0:
+        image.save("text_to_image_generation-stable_diffusion_3-result.png")
+else:
+    image.save("text_to_image_generation-stable_diffusion_3-result.png")
diff --git a/ppdiffusers/examples/controlnet/requirements.txt b/ppdiffusers/examples/controlnet/requirements.txt
@@ -4,3 +4,5 @@ paddlenlp>=2.7.2
 opencv-python
 ppdiffusers>=0.24.0
 cchardet
+gradio==3.16.2
+basicsr==1.4.2
diff --git a/ppdiffusers/examples/dreambooth/README_sd3.md b/ppdiffusers/examples/dreambooth/README_sd3.md
@@ -131,7 +131,7 @@ import paddle
 pipe = StableDiffusion3Pipeline.from_pretrained(
     "stabilityai/stable-diffusion-3-medium-diffusers", paddle_dtype=paddle.float16
 )
-pipeline.load_lora_weights('your-lora-checkpoint')
+pipe.load_lora_weights('your-lora-checkpoint')
 
 image = pipe("A picture of a sks dog in a bucket", num_inference_steps=25).images[0]
 image.save("sks_dog_dreambooth_lora.png")

diff --git a/ppdiffusers/examples/inference/image_to_image_text_guided_generation-stable_diffusion_3.py b/ppdiffusers/examples/inference/image_to_image_text_guided_generation-stable_diffusion_3.py
@@ -20,6 +20,6 @@
 url = "https://paddlenlp.bj.bcebos.com/models/community/CompVis/stable-diffusion-v1-4/sketch-mountains-input.png"
 init_image = load_image(url).resize((512, 512))
 prompt = "cat wizard, gandalf, lord of the rings, detailed, fantasy, cute, adorable, Pixar, Disney, 8k"
-images = pipe(prompt=prompt, image=init_image, strength=0.95, guidance_scale=7.5).images[0]
+image = pipe(prompt=prompt, image=init_image, strength=0.95, guidance_scale=7.5).images[0]
 
 image.save("image_to_image_text_guided_generation-stable_diffusion_3-result.png")
diff --git a/ppdiffusers/examples/inference/text_to_image_generation-stable_diffusion_3.py b/ppdiffusers/examples/inference/text_to_image_generation-stable_diffusion_3.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2022 PaddlePaddle Authors. All Rights Reserved.
+# Copyright (c) 2024 PaddlePaddle Authors. All Rights Reserved.
 #
 # Licensed under the Apache License, Version 2.0 (the "License");
 # you may not use this file except in compliance with the License.
@@ -11,134 +11,13 @@
 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 # See the License for the specific language governing permissions and
 # limitations under the License.
-import os
-import argparse
-import paddle
-def parse_args():
-    parser = argparse.ArgumentParser(
-        description=" Use PaddleMIX to accelerate the Stable Diffusion3 image generation model."
-    )
-    parser.add_argument(
-        "--benchmark",
-        type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
-        default=False,
-        help="if set to True, measure inference performance",
-    )
-    parser.add_argument(
-        "--inference_optimize",
-        type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
-        default=False,
-        help="If set to True, all optimizations except Triton are enabled.",
-    )
-    parser.add_argument(
-        "--inference_optimize_bp",
-        type=(lambda x: str(x).lower() in ["true", "1", "yes"]),
-        default=False,
-        help="If set to True, batch parallel is enabled in DIT and dual-GPU acceleration is used.",
-    )
-    parser.add_argument("--height", type=int, default=512, help="Height of the generated image.")
-    parser.add_argument("--width", type=int, default=512, help="Width of the generated image.")
-    parser.add_argument("--num-inference-steps", type=int, default=50, help="Number of inference steps.")
-    parser.add_argument("--dtype", type=str, default="float32", help="Inference data types.")
-
-    return parser.parse_args()
-
-
-args = parse_args()
-
-if args.inference_optimize:
-    os.environ["INFERENCE_OPTIMIZE"] = "True"
-    os.environ["INFERENCE_OPTIMIZE_TRITON"] = "True"
-if args.inference_optimize_bp:
-    os.environ["INFERENCE_OPTIMIZE_BP"] = "True"
-if args.dtype == "float32":
-    inference_dtype = paddle.float32
-elif args.dtype == "float16":
-    inference_dtype = paddle.float16
 
-
-if args.inference_optimize_bp:
-    from paddle.distributed import fleet
-    from paddle.distributed.fleet.utils import recompute
-    import numpy as np
-    import random
-    import paddle.distributed as dist
-    import paddle.distributed.fleet as fleet
-    strategy = fleet.DistributedStrategy()
-    model_parallel_size = 2
-    data_parallel_size = 1
-    strategy.hybrid_configs = {
-    "dp_degree": data_parallel_size,
-    "mp_degree": model_parallel_size,
-    "pp_degree": 1
-    }
-    fleet.init(is_collective=True, strategy=strategy)
-    hcg = fleet.get_hybrid_communicate_group()
-    mp_id = hcg.get_model_parallel_rank()
-    rank_id = dist.get_rank()
-
-import datetime
+import paddle
 from ppdiffusers import StableDiffusion3Pipeline
-
-
 pipe = StableDiffusion3Pipeline.from_pretrained(
-    "stabilityai/stable-diffusion-3-medium-diffusers",
-    paddle_dtype=inference_dtype,
+    "stabilityai/stable-diffusion-3-medium-diffusers", paddle_dtype=paddle.float16
 )
-
-pipe.transformer = paddle.incubate.jit.inference(
-    pipe.transformer,
-    save_model_dir="./tmp/sd3",
-    enable_new_ir=True,
-    cache_static_model=True,
-    exp_enable_use_cutlass=True,
-    delete_pass_lists=["add_norm_fuse_pass"],
-)
-
 generator = paddle.Generator().manual_seed(42)
 prompt = "A cat holding a sign that says hello world"
-
-
-image = pipe(
-    prompt, num_inference_steps=args.num_inference_steps, width=args.width, height=args.height, generator=generator
-).images[0]
-
-if args.benchmark:
-    # warmup
-    for i in range(3):
-        image = pipe(
-            prompt,
-            num_inference_steps=args.num_inference_steps,
-            width=args.width,
-            height=args.height,
-            generator=generator,
-        ).images[0]
-
-    repeat_times = 10
-    sumtime = 0.0
-    for i in range(repeat_times):
-        paddle.device.synchronize()
-        starttime = datetime.datetime.now()
-        image = pipe(
-            prompt,
-            num_inference_steps=args.num_inference_steps,
-            width=args.width,
-            height=args.height,
-            generator=generator,
-        ).images[0]
-        paddle.device.synchronize()
-        endtime = datetime.datetime.now()
-        duringtime = endtime - starttime
-        duringtime = duringtime.seconds * 1000 + duringtime.microseconds / 1000.0
-        sumtime += duringtime
-        print("SD3 end to end time : ", duringtime, "ms")
-
-    print("SD3 ave end to end time : ", sumtime / repeat_times, "ms")
-    cuda_mem_after_used = paddle.device.cuda.max_memory_allocated() / (1024**3)
-    print(f"Max used CUDA memory : {cuda_mem_after_used:.3f} GiB")
-
-if args.inference_optimize_bp:
-    if rank_id == 0:
-        image.save("text_to_image_generation-stable_diffusion_3-result.png")
-else:
-    image.save("text_to_image_generation-stable_diffusion_3-result.png")
+image = pipe(prompt, generator=generator).images[0]
+image.save("text_to_image_generation-stable_diffusion_3-result.png")
diff --git a/...ers/examples/inference/sd15_infer_demo.py → ...tion-stable_diffusion_paddle_inference.py b/...ers/examples/inference/sd15_infer_demo.py → ...tion-stable_diffusion_paddle_inference.py
diff --git a/ppdiffusers/examples/kandinsky2_2/text_to_image/README.md b/ppdiffusers/examples/kandinsky2_2/text_to_image/README.md
@@ -124,7 +124,7 @@ prior_components = {"prior_" + k: v for k,v in pipe_prior.components.items()}
 pipe = KandinskyV22CombinedPipeline.from_pretrained("kandinsky-community/kandinsky-2-2-decoder", **prior_components)
 
 prompt='A robot pokemon, 4k photo'
-images = pipe(prompt=prompt, negative_prompt=negative_prompt).images
+images = pipe(prompt=prompt).images
 images[0]
 ```