Skip to content

Commit

Permalink
Hackathon No.106 add Stable Diffusion Pipeline Attend and excite (#5296)
Browse files Browse the repository at this point in the history
* add attend_and_excite

* rewrite AttendExciteCrossAttnProcessor by flatten

* check code style

* add record

* modify sample code in README

* move markdown file

* remove md file

* remove unusing code, replace paddle.equal by paddle.equal_all
  • Loading branch information
Yam0214 authored Mar 27, 2023
1 parent 07a5874 commit c2b8481
Show file tree
Hide file tree
Showing 7 changed files with 1,115 additions and 0 deletions.
44 changes: 44 additions & 0 deletions ppdiffusers/examples/reproduce/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# Stable Diffusion

## Using `StableDiffusionAttendAndExcitePipeline` with `PNDMScheduler`

Given a pre-trained text-to-image diffusion model (e.g., Stable Diffusion) the method, Attend-and-Excite, guides the generative model to modify the cross-attention values during the image synthesis process to generate images that more faithfully depict the input text prompt.

You can run this pipeline as so:

```python
from pathlib import Path
import paddle
from ppdiffusers import StableDiffusionAttendAndExcitePipeline, PNDMScheduler


scheduler = PNDMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler")
pipe = StableDiffusionAttendAndExcitePipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
scheduler=scheduler,
)

seed = 123
prompt = "A playful kitten chasing a butterfly in a wildflower meadow"
token_indices = [3,6,10]

generator = paddle.Generator().manual_seed(seed)
image = pipe(
prompt=prompt,
token_indices=token_indices,
generator=generator,
).images[0]

# save
output_dir = Path("output_pd")
prompt_output_path = output_dir / prompt
prompt_output_path.mkdir(exist_ok=True, parents=True)
image.save(prompt_output_path / f'{seed}.png')

```

And the above script running in V100-32GB generates the following image:

<center>
<img src="https://user-images.githubusercontent.com/40912707/226089491-0f3f66c2-3c88-4518-9ee4-d77debd50e9e.png">
</center>
45 changes: 45 additions & 0 deletions ppdiffusers/examples/reproduce/README_cn.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Stable Diffusion

## 使用 `StableDiffusionAttendAndExcitePipeline``PNDMScheduler`

给定一个预先训练好的文本到图像的扩散模型(例如Stable Diffusion),方法`Attend-and-Excite`能引导生成模型在图像生成过程中修改交叉注意力的数值,使得生成图片更忠实于输入文本的提示。

使用该pipeline的示例代码如下

```python

from pathlib import Path
import paddle
from ppdiffusers import StableDiffusionAttendAndExcitePipeline, PNDMScheduler


scheduler = PNDMScheduler.from_pretrained("CompVis/stable-diffusion-v1-4", subfolder="scheduler")
pipe = StableDiffusionAttendAndExcitePipeline.from_pretrained(
"CompVis/stable-diffusion-v1-4",
scheduler=scheduler,
)

seed = 123
prompt = "A playful kitten chasing a butterfly in a wildflower meadow"
token_indices = [3,6,10]

generator = paddle.Generator().manual_seed(seed)
image = pipe(
prompt=prompt,
token_indices=token_indices,
generator=generator,
).images[0]

# save
output_dir = Path("output_pd")
prompt_output_path = output_dir / prompt
prompt_output_path.mkdir(exist_ok=True, parents=True)
image.save(prompt_output_path / f'{seed}.png')

```

在V100-32GB显卡运行上述代码生成结果如下:

<center>
<img src="https://user-images.githubusercontent.com/40912707/226089491-0f3f66c2-3c88-4518-9ee4-d77debd50e9e.png">
</center>
69 changes: 69 additions & 0 deletions ppdiffusers/examples/reproduce/align_record.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@


## 示例prompt

[Hugging Face Spaces](https://huggingface.co/spaces/AttendAndExcite/Attend-and-Excite)选用的prompt和seed为例展示对齐效果如下表:`image_error`一栏分别展示了ppdiffusers和diffusers生成的图片及其绝对误差和相对误差在像素点的分布;`latents_error`一栏展示了模型在去噪过程中每一步生成的潜变量的最大绝对误差(蓝色线条)和最大相对误差(橙色线条)。

| prompt | seed | image_error | latents_error |
| :----------------------------------------------------------- | :--: | ------------------------------------------------------------ | ------------------------------------------------------------ |
| A grizzly bear catching a salmon in a crystal clear river surrounded by a forest | 123 | ![123_error_map](https://user-images.githubusercontent.com/40912707/226089730-215319e0-e9f5-4b7a-b51d-593b82a2e0a2.png) | ![123](https://user-images.githubusercontent.com/40912707/226089728-52b2f185-8cf1-4f39-96a2-9d9bae1e2baf.png) |
| A horse and a dog | 123 | ![123_error_map](https://user-images.githubusercontent.com/40912707/226089753-fa78e6d8-c241-4436-be1b-e63ac8318199.png) | !![123](https://user-images.githubusercontent.com/40912707/226089754-18c1b9ce-0287-4f04-8b85-4dbcdb0871c8.png) |
| A mouse and a red car | 2098 | ![2098_error_map](https://user-images.githubusercontent.com/40912707/226089872-9a6fa7a7-f8e5-45fd-acd2-a9ae5e7df6af.png) | ![2098](https://user-images.githubusercontent.com/40912707/226089874-7004bb01-5e19-4c5a-bd55-af741d9e1e38.png) |
| A painting of an elephant with glasses | 123 | ![123_error_map](https://user-images.githubusercontent.com/40912707/226089922-3d833b5c-a522-47b2-a9ed-5d95f2e4b411.png) | ![123](https://user-images.githubusercontent.com/40912707/226089923-35513880-6de1-4da9-bd57-4c747471c359.png) |
| A playful kitten chasing a butterfly in a wildflower meadow | 123 | ![123_error_map](https://user-images.githubusercontent.com/40912707/226089933-5fd1408a-ea13-4329-8f86-391fad46cce8.png) | ![123](https://user-images.githubusercontent.com/40912707/226089934-ce392b7a-e672-46d2-a5e3-d66d23dc66ed.png) |
| A pod of dolphins leaping out of the water in an ocean with a ship on the background | 123 | ![123_error_map](https://user-images.githubusercontent.com/40912707/226089943-b5def711-deb0-4cb2-9589-79693fd46a4f.png) | ![123](https://user-images.githubusercontent.com/40912707/226089945-3e7c37a4-17d5-4d26-99ee-40812957dea3.png) |





## 更多的prompt


按照论文的prompt构造方法尝试更多的prompt和seed对比ppdiffuser和diffusers的表现。

三种prompt模板分别如下,加粗的token表示使用注意力的token:

* a **{animal}** and a **{animal}**
* a **{animal}** and a {color} **{object}**
* a {colorA} **{objectA}** and a {colorB} **{objectB}**

其中用于填充的单词如下表

| category | words |
| -------- | ------------------------------------------------------------ |
| animals | cat, dog, bird, bear, lion, horse, elephant, monkey, frog, turtle, rabbit, mouse |
| objects | backpack, glasses, crown, suitcase, chair, balloon, bow, car, bowl, bench, clock, apple |
| colors | red, orange, yellow, green, blue, purple, pink, brown, gray, black, white |

对每一个prompt再随机生成5个seed,对比结果绘制散点图如下,其中每个点代表该prompt和seed的设置下在该step上的误差,颜色越亮则表示散点越聚集:

| 绝对误差 | 相对误差 |
| -------------------------- | -------------------------- |
| ![align_record_atol](https://user-images.githubusercontent.com/40912707/226089978-a760e900-9309-4058-9075-b5853fcda549.png) | ![align_record_rtol](https://user-images.githubusercontent.com/40912707/226089977-93d9e502-73ce-4b53-ab3d-066f603ce8d6.png) |

在测试的105组prompt和seed中,有13个prompt和seed的组合生成的图片有较大差别,记录如下:

| prompt | seed | image_error | latents_error |
| :------------------------------------- | ---- | ------------------------------------------------------------ | ----------------------------------------------------------- |
| a dog and a pink crown | 1930 | ![1930_error_map](https://user-images.githubusercontent.com/40912707/226090081-1448b7af-db20-4650-b181-44d02421a8fd.png) | ![1930](https://user-images.githubusercontent.com/40912707/226090079-52dfc07d-db0b-46b6-a870-dee4304a19de.png) |
| a mouse and a yellow apple | 28 | ![28_error_map](https://user-images.githubusercontent.com/40912707/226090224-ccda6bfc-d0c2-4332-baac-be4ca92f09ae.png) | ![28](https://user-images.githubusercontent.com/40912707/226090236-3d362322-c6c3-4d9c-9769-60bdf554c030.png) |
| a dog and a pink crown | 1285 | ![1285_error_map](https://user-images.githubusercontent.com/40912707/226090083-1a654c9f-8c79-45c6-ac4c-88065a9526a0.png) | ![1285](https://user-images.githubusercontent.com/40912707/226090082-134c5aa3-7c8d-4481-acde-f4303db3f8d5.png) |
| a mouse and a yellow apple | 60 | ![60_error_map](https://user-images.githubusercontent.com/40912707/226090232-9b5e09d8-139b-4b75-9bd8-463443cb57e2.png) |![60](https://user-images.githubusercontent.com/40912707/226090231-dcd6aadb-185d-49c8-925f-ef4c491a4197.png) |
| a mouse and a yellow apple | 49 | ![49_error_map](https://user-images.githubusercontent.com/40912707/226090227-0d760233-c9d4-4a54-88c5-cb044ae15571.png) | ![49](https://user-images.githubusercontent.com/40912707/226090226-f31ca86d-5947-471c-904c-173d5fd515ad.png) |
| a white chair and a green chair | 933 | ![933_error_map](https://user-images.githubusercontent.com/40912707/226090329-c585ffcf-e17a-4cd1-94e1-ac91f3354f0b.png) | ![933](https://user-images.githubusercontent.com/40912707/226090328-6fab40a9-d9e8-457d-8d99-d37bf860f215.png) |
| a dog and a black bow | 148 | ![148_error_map](https://user-images.githubusercontent.com/40912707/226090404-344e8004-ee42-454a-9b08-de19bfc85ad8.png) | ![148](https://user-images.githubusercontent.com/40912707/226090406-d03b9529-7667-4520-b6be-fbbd558230ae.png) |
| A horse and a dog | 27 | ![27_error_map](https://user-images.githubusercontent.com/40912707/226090430-0827b75c-19b6-4357-b475-28edf4b8cadc.png) | ![27](https://user-images.githubusercontent.com/40912707/226090432-4523dc64-9ce7-42ef-bdf2-8e192cebb212.png)|
| a horse and a gray bow | 1753 | !![1743_error_map](https://user-images.githubusercontent.com/40912707/226090077-637bcfff-cbfa-44c9-a5c9-6a482526a734.png) | ![1743](https://user-images.githubusercontent.com/40912707/226090074-e84b8ab2-27db-444a-9f73-142e91e3a286.png) |
| A painting of an elephant with glasses | 1860 | ![1860_error_map](https://user-images.githubusercontent.com/40912707/226090464-2554b7ca-c40b-4b8b-b746-1529050e91c7.png) |![1860](https://user-images.githubusercontent.com/40912707/226090462-c67f7210-9026-49bc-bf6d-b70dc626be46.png)|
| a horse and a gray bow | 1597 | ![1597_error_map](https://user-images.githubusercontent.com/40912707/226090499-c7ae25d1-59b7-4bfb-b318-048e6c180a92.png) |![1597](https://user-images.githubusercontent.com/40912707/226090498-f2fc3617-843f-4968-8369-3db92b2ad8b5.png) |
| a dog and a pink crown | 1743 | ![1743_error_map](https://user-images.githubusercontent.com/40912707/226090529-b56f5e0c-0808-4dfc-b7be-d7f0dfa01696.png)| ![1743](https://user-images.githubusercontent.com/40912707/226090528-5425e457-11a6-4985-b17b-237eae93c138.png) |
| a white chair and a green chair | 890 |![890_error_map](https://user-images.githubusercontent.com/40912707/226090327-929f60ca-5eb3-4019-a19b-db1ad8a307d2.png) | ![890](https://user-images.githubusercontent.com/40912707/226090332-d8b281ed-2a2b-4964-8c07-e0120990b344.png) |



#### 由于存在梯度,同样的prompt和seed设置下多次运行依旧存在误差

| diffusers | ppdiffusers |
| ------------------------------------------------------------ | ------------------------------------------------------------ |
| ![image-20230318002849117](https://user-images.githubusercontent.com/40912707/226090011-7d4fb85b-229f-4d62-b6bf-b7166c9341ef.png) | ![image-20230318002900522](https://user-images.githubusercontent.com/40912707/226090012-1ffe4095-fa18-46fb-8b11-f9b8369c79a6.png) |
1 change: 1 addition & 0 deletions ppdiffusers/ppdiffusers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -115,6 +115,7 @@
LDMBertModel,
LDMTextToImagePipeline,
PaintByExamplePipeline,
StableDiffusionAttendAndExcitePipeline,
StableDiffusionControlNetPipeline,
StableDiffusionDepth2ImgPipeline,
StableDiffusionImageVariationPipeline,
Expand Down
1 change: 1 addition & 0 deletions ppdiffusers/ppdiffusers/pipelines/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,7 @@
from .paint_by_example import PaintByExamplePipeline
from .stable_diffusion import (
CycleDiffusionPipeline,
StableDiffusionAttendAndExcitePipeline,
StableDiffusionControlNetPipeline,
StableDiffusionDepth2ImgPipeline,
StableDiffusionImageVariationPipeline,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,9 @@ class StableDiffusionPipelineOutput(BaseOutput):
from .pipeline_cycle_diffusion import CycleDiffusionPipeline
from .pipeline_stable_diffusion import StableDiffusionPipeline
from .pipeline_stable_diffusion_all_in_one import StableDiffusionPipelineAllinOne
from .pipeline_stable_diffusion_attend_and_excite import (
StableDiffusionAttendAndExcitePipeline,
)
from .pipeline_stable_diffusion_controlnet import StableDiffusionControlNetPipeline
from .pipeline_stable_diffusion_img2img import StableDiffusionImg2ImgPipeline
from .pipeline_stable_diffusion_inpaint import StableDiffusionInpaintPipeline
Expand Down
Loading

0 comments on commit c2b8481

Please sign in to comment.