Skip to content

Commit 52d3342

Browse files
committed
Added quantization example
1 parent f172c59 commit 52d3342

File tree

17 files changed

+320
-37
lines changed

17 files changed

+320
-37
lines changed

README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,3 +4,29 @@ A collection of code samples for working with Stability AI's models. This repo w
44
![Image-to-Image](./images/screenshot_image_to_image.png)
55

66
![Inpainting](./images/screenshot_inpainting.png)
7+
8+
## Stable Diffusion 3.5 Inference Speeds
9+
|Model|Inference Speed (seconds)|GPU|
10+
|-----|-------------------------|---|
11+
|SD3.5 M|4 s|NVIDIA H100 GPU with 80 GB of VRAM|
12+
|[4-Bit Quanitized SD3.5 L](/sd35-text-to-image-quantized-gradio/)|18 s|NVIDIA H100 GPU with 80 GB of VRAM|
13+
|SD3.5 L|7 s|NVIDIA H100 GPU with 80 GB of VRAM|
14+
15+
## Stable Diffusion 3.5 Prompt Tuning Using Guidance Scale
16+
The [guidance_scale](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.guidance_scale) parameter has a significant impact on image generation with Stable Diffusion 3.5 models:
17+
> A higher guidance scale value encourages the model to generate images closely linked to the text prompt at the expense of lower image quality
18+
19+
Image quality can vary drastically based on the `guidance_scale` value. The below screenshots provide some recommended `guidance_scale` settings for three Stable Diffusion 3.5 models:
20+
* [Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) (SD3.5 L)
21+
* [Sample code](./sd35-text-to-image-gradio/app.py)
22+
* [4-Bit Quantized Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) (NF4 SD3.5 L)
23+
* NF4: [Normal Floating Point 4](https://huggingface.co/docs/diffusers/v0.32.2/en/quantization/bitsandbytes#normal-float-4-nf4)
24+
* [Sample code](./sd35-text-to-image-quantized-gradio/app.py)
25+
* [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) (SD3.5 M)
26+
27+
### Guidance Scale Examples
28+
|Model|[guidance_scale](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img#diffusers.StableDiffusionPipeline.__call__.guidance_scale) (float 1-10)|Example|
29+
|-----|--------------|-------|
30+
|SD3.5 L|`guidance_scale=2.5`|![sd3.5 L guidance_scale=2.5](./images/guidance-scale-examples/sd3.5%20L%20guidance_scale=2.5.png)|
31+
|NF4 SD3.5 L|`guidance_scale=7.5`|![nf4 sd3.5 L guidance_scale=7.5](./images/guidance-scale-examples/nf4%20sd3.5%20L%20guidance_scale=7.5.png)|
32+
|SD3.5 M|`guidance_scale=5.0`|![sd3.5 M guidance_scale=5](./images/guidance-scale-examples/sd3.5%20M%20guidance_scale=5.png)|
Loading
Loading
Loading

sd35-image-to-image-flask/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# Stable Diffusion 3.5 Image-to-Image Python Flask App
22
This repo folder is for making a simple Stable Diffusion 3.5 Image-to-Image API, using Python Flask
33

4-
**Estimated Inference Speed:** 23 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
4+
**Estimated Inference Speed:** 7 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
55

66
**[Postman](https://www.postman.com/downloads/) Screenshot:**
77
![Postman Screenshot](./images/postman_screenshot.png)

sd35-image-to-image-gradio/README.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
# Stable Diffusion 3.5 Image-to-Image in Gradio
22
Gradio demo of [image-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img) using Stable Diffusion 3.5 Medium
33

4-
**Estimated Inference Speed:** 23 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
4+
**Estimated Inference Speed:** 7 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
55

66
Full documentation is available on Hugging Face: [Stable Diffusion Image-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/img2img)
77

8-
### Screen Shot
8+
### Screenshot
99
![Screenshot](./images/screenshot.png)
1010

1111
## Quick Start
@@ -66,28 +66,32 @@ init_image = init_image.resize((640, 1536))
6666
```
6767
#### 1536x1536
6868

69-
![1536x1536](./images/input-image-size-examples/1536x1536.png)
69+
![1536x1536](./images/input-image-size-examples/1536x1536.png)
7070

7171
#### 640x640
7272

73-
![640x640](./images/input-image-size-examples/640x640.png)
73+
![640x640](./images/input-image-size-examples/640x640.png)
7474

7575
#### 64x64
7676

77-
![64x64](./images/input-image-size-examples/64x64.png)
77+
![64x64](./images/input-image-size-examples/64x64.png)
7878

7979
#### 20x20
8080

81-
![20x20](./images/input-image-size-examples/20x20.png)
81+
![20x20](./images/input-image-size-examples/20x20.png)
8282

8383
#### 1x1536
8484

85-
![1x1536](./images/input-image-size-examples/1x1536.png)
85+
**NOTE:** The error is due to the [Pillow](https://pypi.org/project/pillow/) [PIL.Image.resize()](https://github.com/Stability-AI/stability-ai-toolkit/blob/main/sd35-image-to-image-gradio/app.py#L56) method not liking the resize dimensions. Developers should test if SD3.5 image-to-image can tolerate these dimensions
86+
87+
![1x1536](./images/input-image-size-examples/1x1536.png)
8688

8789
#### 5x12
8890

89-
![5x12](./images/input-image-size-examples/5x12.png)
91+
**NOTE:** The error is due to the [Pillow](https://pypi.org/project/pillow/) [PIL.Image.resize()](https://github.com/Stability-AI/stability-ai-toolkit/blob/main/sd35-image-to-image-gradio/app.py#L56) method not liking the resize dimensions. Developers should test if SD3.5 image-to-image can tolerate these dimensions
92+
93+
![5x12](./images/input-image-size-examples/5x12.png)
9094

9195
#### 640x1536
9296

93-
![640x1536](./images/input-image-size-examples/640x1536.png)
97+
![640x1536](./images/input-image-size-examples/640x1536.png)

sd35-image-to-image-gradio/example_prompts.txt

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,18 +2,18 @@ positive prompt:
22
Replace the soldiers with elves holding bows and arrows
33

44
positive prompt:
5-
Replace the soldiers with elves holding crossbows, first-person-shooter screen shot, 4k
5+
Replace the soldiers with elves holding crossbows, first-person-shooter screenshot, 4k
66
The elves are wearing hoods
77

88

99
positive prompt:
10-
Replace the soldiers with elves holding bows and arrows, first-person-shooter screen shot, 4k
10+
Replace the soldiers with elves holding bows and arrows, first-person-shooter screenshot, 4k
1111
The elves are wearing hoods
1212
There is a dragon flying in the sky
1313

1414

1515
positive prompt:
16-
Replace the soldiers with elves holding bows and arrows, video game screen shot, 4k
16+
Replace the soldiers with elves holding bows and arrows, video game screenshot, 4k
1717
The elves are wearing hoods. There is one dragon flying in the sky
1818

1919
negative prompt:

sd35-inpainting-gradio/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,9 @@
11
# Stable Diffusion 3.5 Inpainting in Gradio
22
Gradio demo of inpainting using Stable Diffusion 3.5 Large
33

4-
**Estimated Inference Speed:** 23 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
4+
**Estimated Inference Speed:** 7 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
55

6-
### Screen Shot
6+
### Screenshot
77
![screenshot.png](./images/screenshot.png)
88

99
#### Input Image and Gradio ImageMask

sd35-text-to-image-gradio/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,9 @@ Gradio demo of [text-to-image](https://huggingface.co/docs/diffusers/api/pipelin
33

44
Full documentation is available on Hugging Face: [Stable Diffusion Text-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img)
55

6-
**Estimated Inference Speed:** 23 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
6+
**Estimated Inference Speed:** 7 seconds for Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
77

8-
### Screen Shot
8+
### Screenshot
99
![Screenshot](./images/screenshot.png)
1010

1111
## Quick Start

sd35-text-to-image-gradio/app.py

Lines changed: 13 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020
import torch
2121
import os
2222

23-
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
2423
from diffusers import StableDiffusion3Pipeline
2524
from huggingface_hub import login
2625

@@ -43,6 +42,16 @@ def login_to_hugging_face(self):
4342
login()
4443
print("\nWARNING: To avoid the Hugging Face login prompt in the future, please set the HF_TOKEN environment variable:\n\n export HF_TOKEN=<YOUR HUGGING FACE USER ACCESS TOKEN>\n")
4544

45+
def _check_shader(self):
46+
if torch.backends.mps.is_available():
47+
device = "mps"
48+
elif torch.cuda.is_available():
49+
device = "cuda"
50+
else:
51+
device = "cpu"
52+
53+
return device
54+
4655
def _predict(self, guidance_scale, prompt, negative_prompt, progress=gr.Progress(track_tqdm=True)):
4756
images = self._pipe(
4857
prompt=prompt,
@@ -65,26 +74,11 @@ def _start_gradio(self):
6574
).launch(debug=True, share=True)
6675

6776
def start_text_to_image(self):
68-
model_id = "stabilityai/stable-diffusion-3.5-large"
69-
70-
nf4_config = BitsAndBytesConfig(
71-
load_in_4bit=True,
72-
bnb_4bit_quant_type="nf4",
73-
bnb_4bit_compute_dtype=torch.bfloat16
74-
)
75-
model_nf4 = SD3Transformer2DModel.from_pretrained(
76-
model_id,
77-
subfolder="transformer",
78-
quantization_config=nf4_config,
79-
torch_dtype=torch.bfloat16
80-
)
81-
8277
self._pipe = StableDiffusion3Pipeline.from_pretrained(
83-
model_id,
84-
transformer=model_nf4,
85-
torch_dtype=torch.bfloat16
78+
"stabilityai/stable-diffusion-3.5-large", torch_dtype=torch.bfloat16
8679
)
87-
self._pipe.enable_model_cpu_offload()
80+
device = self._check_shader()
81+
self._pipe.to(device)
8882

8983
self._start_gradio()
9084
return 0

sd35-text-to-image-gradio/example_prompts.txt

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,11 @@ positive prompt:
22
Children's birthday party
33

44
negative prompt:
5-
No birthday cake
5+
No birthday cake
6+
7+
8+
positive prompt:
9+
A group of elves hunting a dragon, 4k cinema
10+
11+
negative prompt:
12+
No green grass
35.1 KB
Loading
Lines changed: 106 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,106 @@
1+
# 4-Bit Quantized Stable Diffusion 3.5 Text-to-Image in Gradio
2+
Gradio demo of [text-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img) using 4-bit quantized Stable Diffusion 3.5 Large
3+
4+
Full documentation is available on Hugging Face: [Stable Diffusion Text-to-image](https://huggingface.co/docs/diffusers/api/pipelines/stable_diffusion/text2img)
5+
6+
**Estimated Inference Speed:** 18 seconds for quantized Stable Diffusion 3.5 Large on an NVIDIA H100 GPU
7+
8+
### Screenshot
9+
![Screenshot](./images/screenshot.png)
10+
11+
## Quick Start
12+
1. Open a web browser, log in to Hugging Face and register your name and email,
13+
to use [stable-diffusion-3.5-large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large)
14+
2. Create a new Hugging Face [user access token](https://huggingface.co/docs/hub/en/security-tokens),
15+
which will capture that you completed the registration form
16+
3. Clone this repo to your machine and change into the directory for this demo:
17+
```
18+
cd ./stability-ai-toolkit/sd35-text-to-image-gradio
19+
```
20+
4. Set up the app in a Python virtual environment:
21+
22+
```
23+
python -m venv <your_environment_name>
24+
source <your_environment_name>/bin/activate
25+
```
26+
5. Set your `HF_TOKEN` inside your virtual environment
27+
```
28+
export HF_TOKEN=<Hugging Face user access token>
29+
```
30+
6. Install dependencies
31+
```
32+
pip install -r requirements.txt
33+
```
34+
35+
NOTE: Read [requirements.txt](./requirements.txt) for
36+
[MacOS PyTorch installation instructions](https://developer.apple.com/metal/pytorch/)
37+
38+
TL;DR:
39+
```
40+
# Inside your virtual environment
41+
pip install --pre torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/nightly/cpu
42+
```
43+
7. Start the app
44+
```
45+
python app.py
46+
```
47+
8. Open UI in a web browser: [http://127.0.0.1:7861](http://127.0.0.1:7861)
48+
49+
## How to Quantize Stable Diffusion 3.5 Large
50+
### [With Quantization](./app.py)
51+
```
52+
import torch
53+
from diffusers import BitsAndBytesConfig, SD3Transformer2DModel
54+
from diffusers import StableDiffusion3Pipeline
55+
...
56+
model_id = "stabilityai/stable-diffusion-3.5-large"
57+
58+
nf4_config = BitsAndBytesConfig(
59+
    load_in_4bit=True,
60+
    bnb_4bit_quant_type="nf4",
61+
    bnb_4bit_compute_dtype=torch.bfloat16
62+
)
63+
model_nf4 = SD3Transformer2DModel.from_pretrained(
64+
    model_id,
65+
    subfolder="transformer",
66+
    quantization_config=nf4_config,
67+
    torch_dtype=torch.bfloat16
68+
)
69+
70+
pipe = StableDiffusion3Pipeline.from_pretrained(
71+
    model_id, 
72+
    transformer=model_nf4,
73+
    torch_dtype=torch.bfloat16
74+
)
75+
pipe.enable_model_cpu_offload()
76+
```
77+
### [Without Quantization](/sd35-text-to-image-gradio/app.py)
78+
```
79+
import torch
80+
from diffusers import StableDiffusion3Pipeline
81+
...
82+
model_id = "stabilityai/stable-diffusion-3.5-large"
83+
84+
pipe = StableDiffusion3Pipeline.from_pretrained(
85+
    model_id,
86+
    torch_dtype=torch.bfloat16
87+
)
88+
```
89+
90+
## Why Use Quantized Stable Diffusion 3.5 Large
91+
92+
**NOTE:** There is a **SIGNIFICANT IMPROVEMENT** in **NEGATIVE PROMPTING** accuracy, when using 4-bit quantized Stable Diffusion 3.5 Large
93+
94+
Many use cases for [Stable Diffusion 3.5 Large](https://huggingface.co/stabilityai/stable-diffusion-3.5-large) (SD3.5 L) require the algorithms of the model, without the large memory footprint:
95+
* 4-bit quantization of SD3.5 L allows it to load onto GPUs with limited VRAM
96+
* 4-bit quantization makes it easier to offload certain parts of model execution to the CPU, further reducing GPU memory usage
97+
* There is often an acceptable decrease in generate image quality, with the benefit of a reduced cost due to reduced VRAM
98+
* Users working on their own computer with a retail GPU (or Apple Silicon with an integrated GPU) would benefit from this use case
99+
* [Stable Diffusion 3.5 Medium](https://huggingface.co/stabilityai/stable-diffusion-3.5-medium) (SD3.5 M) could alternatively be used as it has fewer parameters than Large and an inference speed that's even faster than quantized SD3.5 L
100+
101+
### Stable Diffusion 3.5 Inference Speeds
102+
|Model|Inference Speed (seconds)|GPU|
103+
|-----|-------------------------|---|
104+
|SD3.5 M|4 s|NVIDIA H100 GPU with 80 GB of VRAM|
105+
|[4-Bit Quanitized SD3.5 L](/sd35-text-to-image-quantized-gradio/)|18 s|NVIDIA H100 GPU with 80 GB of VRAM|
106+
|SD3.5 L|7 s|NVIDIA H100 GPU with 80 GB of VRAM|

0 commit comments

Comments
 (0)