Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use fused_multi_head_attention_inference_v2 to implement CrossAttention #459

Merged
merged 7 commits into from
Dec 26, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 18 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,32 @@
[![PyPI version](https://badge.fury.io/py/onediff.svg)](https://badge.fury.io/py/onediff)
[![Docker image build](https://github.com/Oneflow-Inc/onediff/actions/workflows/sd.yml/badge.svg)](https://github.com/Oneflow-Inc/onediff/actions/workflows/sd.yml)
[![Run examples](https://github.com/Oneflow-Inc/onediff/actions/workflows/examples.yml/badge.svg?event=schedule)](https://github.com/Oneflow-Inc/onediff/actions/workflows/examples.yml?query=event%3Aschedule)
[![Docker image build](https://github.com/siliconflow/onediff/actions/workflows/sd.yml/badge.svg)](https://github.com/siliconflow/onediff/actions/workflows/sd.yml)
[![Run examples](https://github.com/siliconflow/onediff/actions/workflows/examples.yml/badge.svg?event=schedule)](https://github.com/siliconflow/onediff/actions/workflows/examples.yml?query=event%3Aschedule)

# OneDiff

**An out-of-the-box acceleration library for diffusion models** (especially for ComfyUI, HF diffusers, and Stable Diffusion web UI).

## Easy to use
- Acceleration for popular UIs/libs
- [ComfyUI](https://github.com/Oneflow-Inc/onediff/tree/main/onediff_comfy_nodes)
- [HF diffusers 🤗](https://github.com/Oneflow-Inc/onediff/tree/main/examples)
- [Stable Diffusion web UI](https://github.com/Oneflow-Inc/onediff/tree/main/onediff_sd_webui_extensions)
- [ComfyUI](https://github.com/siliconflow/onediff/tree/main/onediff_comfy_nodes)
- [HF diffusers 🤗](https://github.com/siliconflow/onediff/tree/main/examples)
- [Stable Diffusion web UI](https://github.com/siliconflow/onediff/tree/main/onediff_sd_webui_extensions)
- Acceleration for state-of-the-art Models
- [SDXL](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image_sdxl.py)
- [SDXL Turbo](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image_sdxl_turbo.py)
- [SD 1.5/2.1](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image.py)
- [LoRA (and dynamic switching LoRA)](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image_sdxl_lora.py)
- [ControlNet](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image_controlnet.py)
- [LCM](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image_lcm.py) and [LCM LoRA](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image_lcm_lora_sdxl.py)
- [Stable Video Diffusion](https://github.com/Oneflow-Inc/onediff/blob/8a35a9e7df45bbfa5bb05011b8357480acb5836e/onediff_comfy_nodes/_nodes.py#L169)
- [DeepCache](https://github.com/Oneflow-Inc/onediff/blob/8a35a9e7df45bbfa5bb05011b8357480acb5836e/onediff_comfy_nodes/_nodes.py#L414)
- [SDXL](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_sdxl.py)
- [SDXL Turbo](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_sdxl_turbo.py)
- [SD 1.5/2.1](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image.py)
- [LoRA (and dynamic switching LoRA)](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_sdxl_lora.py)
- [ControlNet](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_controlnet.py)
- [LCM](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_lcm.py) and [LCM LoRA](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_lcm_lora_sdxl.py)
- [Stable Video Diffusion](https://github.com/siliconflow/onediff/blob/8a35a9e7df45bbfa5bb05011b8357480acb5836e/onediff_comfy_nodes/_nodes.py#L169)
- [DeepCache](https://github.com/siliconflow/onediff/blob/8a35a9e7df45bbfa5bb05011b8357480acb5836e/onediff_comfy_nodes/_nodes.py#L414)
- Out-of-the-box acceleration
- [ComfyUI Nodes](https://github.com/Oneflow-Inc/onediff/tree/main/onediff_comfy_nodes)
- [Acceleration with oneflow_compile](https://github.com/Oneflow-Inc/onediff/blob/a38c5ea475c07b4527981ec5723ccac083ed0a9c/examples/text_to_image_sdxl.py#L53)
- [ComfyUI Nodes](https://github.com/siliconflow/onediff/tree/main/onediff_comfy_nodes)
- [Acceleration with oneflow_compile](https://github.com/siliconflow/onediff/blob/a38c5ea475c07b4527981ec5723ccac083ed0a9c/examples/text_to_image_sdxl.py#L53)
- Multi-resolution input
- Compile and save the compiled result offline, then load it online for serving
- [Save and Load](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image_sdxl_save_load.py)
- [Change device to do multi-process serving](https://github.com/Oneflow-Inc/onediff/blob/main/examples/text_to_image_sdxl_mp_load.py)
- [Save and Load](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_sdxl_save_load.py)
- [Change device to do multi-process serving](https://github.com/siliconflow/onediff/blob/main/examples/text_to_image_sdxl_mp_load.py)

## State-of-the-art performance
Updated on Nov 6, 2023.
Expand Down Expand Up @@ -101,7 +101,7 @@ python3 -m pip install "torch" "transformers==4.27.1" "diffusers[torch]==0.19.3"

#### 3. Install OneDiff
```
git clone https://github.com/Oneflow-Inc/onediff.git
git clone https://github.com/siliconflow/onediff.git
cd onediff && python3 -m pip install -e .
```

Expand Down
2 changes: 1 addition & 1 deletion benchmarks/docker/config/community-default.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ set_pip_mirror: "pip config set global.index-url https://pypi.tuna.tsinghua.edu.
oneflow_pip_index: "https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu122"
repos:
- onediff:
repo_url: https://github.com/Oneflow-Inc/onediff.git
repo_url: https://github.com/siliconflow/onediff.git
branch: main
cmds:
- "python3 -m pip install transformers==4.27.1 diffusers[torch]==0.19.3"
Expand Down
4 changes: 2 additions & 2 deletions onediff_comfy_nodes/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,7 +56,7 @@ pip install --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/bra
3. Intall onediff

```bash
https://github.com/Oneflow-Inc/onediff.git
git clone https://github.com/siliconflow/onediff.git
cd onediff && pip install -e .
```

Expand All @@ -69,7 +69,7 @@ cp -r onediff_comfy_nodes path/to/ComfyUI/custom_nodes/

6. (Optional) Enterprise Level Support

If you need Enterprise Level Support for your system or business, please refer to [OneDiff Enterprise Edition](https://github.com/Oneflow-Inc/onediff/blob/main/README.md#onediff-enterprise-edition) .
If you need Enterprise Level Support for your system or business, please refer to [OneDiff Enterprise Edition](https://github.com/siliconflow/onediff/blob/main/README.md#onediff-enterprise-edition) .



Expand Down
13 changes: 6 additions & 7 deletions onediff_sd_webui_extensions/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,21 +6,20 @@

## Performance of Community Edition

Updated on DEC 13, 2023. Device: RTX 3090. Resolution: 1024x1024

| torch(Baseline) | onediff(Optimized) | Percentage improvement |
| --------------- | ------------------ | ---------------------- |
| 2.99it/s | 4.49it/s | 50.17% |
Updated on DEC 26, 2023. Device: RTX 3090. Resolution: 1024x1024
| torch(Baseline) | TensorRT-v9.0.1 | onediff(Optimized) | Percentage improvement |
| --------------- | --------------- | ------------------ | ---------------------- |
| 2.99it/s | 6.40it/s | 6.71it/s | 224.41% |

## Installation Guide

It is recommended to create a Python virtual environment in advance. For example `conda create -n sd-webui python=3.10`.

```bash
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
git clone https://github.com/Oneflow-Inc/onediff.git
git clone https://github.com/siliconflow/onediff.git
cp -r onediff/onediff_sd_webui_extensions stable-diffusion-webui/extensions/
cd stable-diffusion-webui && git checkout 4afaaf8 # The tested git commit id is 4afaaf8.
cp -r ../onediff/onediff_sd_webui_extensions stable-diffusion-webui/extensions/

# Install all of stable-diffusion-webui's dependencies.
venv_dir=- bash webui.sh --port=8080
Expand Down
2 changes: 1 addition & 1 deletion onediff_sd_webui_extensions/install.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ def install():
launch.run_pip("install --pre oneflow -f https://oneflow-pro.oss-cn-beijing.aliyuncs.com/branch/community/cu118")
if not launch.is_installed("onediff"):
print("onediff is not installed! Installing...")
launch.run_pip("install git+https://github.com/Oneflow-Inc/onediff.git")
launch.run_pip("install git+https://github.com/siliconflow/onediff.git")


install()
72 changes: 18 additions & 54 deletions onediff_sd_webui_extensions/scripts/onediff.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ def _(mod, verbose=False) -> ListConfig:
_compiled = None



# https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/sd_hijack_optimizations.py#L142
# https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/modules/sd_hijack_optimizations.py#L221
class CrossAttentionOflow(nn.Module):
Expand Down Expand Up @@ -71,59 +72,18 @@ def forward(
k_in = self.to_k(context_k)
v_in = self.to_v(context_v)

dtype = q_in.dtype
# from modules import shared
# if shared.opts.upcast_attn:
# q_in, k_in, v_in = q_in.float(), k_in.float(), v_in if v_in.device.type == 'mps' else v_in.float()

# with devices.without_autocast(disable=not shared.opts.upcast_attn):
k_in = k_in * self.scale

del context, x

q, k, v = (rearrange(t, 'b n (h d) -> (b h) n d', h=h) for t in (q_in, k_in, v_in))
del q_in, k_in, v_in

r1 = flow.zeros(q.shape[0], q.shape[1], v.shape[2], device=q.device, dtype=q.dtype)

# mem_free_total = get_available_vram()
from modules import sd_hijack_optimizations
mem_free_total = sd_hijack_optimizations.get_available_vram()

gb = 1024 ** 3
tensor_size = q.shape[0] * q.shape[1] * k.shape[1] * q.element_size()
modifier = 3 if q.element_size() == 2 else 2.5
mem_required = tensor_size * modifier
steps = 1

if mem_required > mem_free_total:
steps = 2 ** (math.ceil(math.log(mem_required / mem_free_total, 2)))

if steps > 64:
max_res = math.floor(math.sqrt(math.sqrt(mem_free_total / 2.5)) / 8) * 64
raise RuntimeError(f'Not enough memory, use lower resolution (max approx. {max_res}x{max_res}). '
f'Need: {mem_required / 64 / gb:0.1f}GB free, Have:{mem_free_total / gb:0.1f}GB free')

slice_size = q.shape[1] // steps
for i in range(0, q.shape[1], slice_size):
end = min(i + slice_size, q.shape[1])
s1 = einsum('b i d, b j d -> b i j', q[:, i:end], k)

# s2 = s1.softmax(dim=-1, dtype=q.dtype)
s2 = s1.softmax(dim=-1)
del s1

r1[:, i:end] = einsum('b i j, b j d -> b i d', s2, v)
del s2

del q, k, v

r1 = r1.to(dtype)

r2 = rearrange(r1, '(b h) n d -> b n (h d)', h=h)
del r1

return self.to_out(r2)
out = flow._C.fused_multi_head_attention_inference_v2(
query=q_in,
query_layout="BM(HK)",
query_head_size=self.to_q.out_features//self.heads,
key=k_in,
key_layout="BM(HK)",
value=v_in,
value_layout="BM(HK)",
output_layout="BM(HK)",
causal=False,
)
return self.to_out(out)


# https://github.com/Stability-AI/generative-models/blob/e5963321482a091a78375f3aeb2c3867562c913f/sgm/modules/diffusionmodules/wrappers.py#L24
Expand Down Expand Up @@ -161,6 +121,8 @@ def forward(self, t_emb):
CrossAttention: CrossAttentionOflow,
GroupNorm32: GroupNorm32Oflow,
}


register(package_names=["sgm"], torch2oflow_class_map=torch2oflow_class_map)


Expand All @@ -171,6 +133,9 @@ def compile(sd_model):
return
global _compiled
_compiled = oneflow_compile(sd_model.model.diffusion_model, use_graph=True)
# add sgm package path to sys.path to avoid mock error
import sgm, sys
sys.path.append(sgm.__path__[0][:-4])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里注明下原因吧,看起来比较奇怪

Copy link
Contributor

@ccssu ccssu Dec 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oneflow.mock.enable():下找不到 sgm 这个 模块 , 以前拷贝文件夹那种方式 没有这个问题 是因为用有提前加入sys.path
image

具体是因为 mock不是根据 disable_mod_cache 中 对应 sgm 的 spec 找 , 导致 相对的路径 oneflow_mock下找不到模块。

time_embed_wrapper = TimeEmbedModule(_compiled._deployable_module_model.oneflow_module.time_embed)
# https://github.com/Stability-AI/generative-models/blob/e5963321482a091a78375f3aeb2c3867562c913f/sgm/modules/diffusionmodules/openaimodel.py#L984
setattr(_compiled._deployable_module_model.oneflow_module, "time_embed", time_embed_wrapper)
Expand All @@ -187,7 +152,6 @@ def run(self, p):
global _compiled
if _compiled is None:
compile(shared.sd_model)
# compile(shared.sd_model)
original = shared.sd_model.model.diffusion_model
from sgm.modules.diffusionmodules.wrappers import OpenAIWrapper
orig_forward = OpenAIWrapper.forward
Expand Down
2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
name="onediff",
version="0.11.0.dev",
description="OneFlow backend for diffusers",
url="https://github.com/Oneflow-Inc/oneflow",
url="https://github.com/siliconflow/oneflow",
author="OneFlow contributors",
license="Apache",
author_email="caishenghang@oneflow.org",
Expand Down
Loading