Apply applicable `quantization_config` to model components when loading a model #10327

vladmandic · 2024-12-20T20:33:20Z

With new improvements to quantization_config, memory requirements of models such as SD35 and FLUX.1 are much lower.
However, user must load each model component that he wants quantized manually and then assemble the pipeline.

For example:

quantization_config = BitsAndBytesConfig(...)
transformer = SD3Transformer2DModel.from_pretrained(repo_id, subfolder="transformer", quantization_config=quantization_config)
text_encoder = T5EncoderModel.from_pretrained(repo_id, subfolder="text_encoder_3", quantization_config=quantization_config)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, transformer=transformer, text_encoder=text_encoder)

The ask is to allow pipeline loader itself to process quantization_config and automatically use it on applicable modules if its present
That would allow much simpler use without user needing to know exact internal components of the each model:

quantization_config = BitsAndBytesConfig(...)
pipe = StableDiffusion3Pipeline.from_pretrained(repo_id, quantization_config=quantization_config)

This is a generic ask that should work for pretty much all models, although primary use case is with the most popular models such as SD35 and FLUX.1

@yiyixuxu @sayakpaul @DN6 @asomoza

The text was updated successfully, but these errors were encountered:

sayakpaul · 2024-12-21T01:43:24Z

Yeah this is planned. I thought we had created an issue for it to track, but clearly, it had slipped through the cracks.

We should also have something like exclude_modules to let the users specify the names of the models to not quantize (typically the CLIP text encoder, VAE, or any model that doesn't have too many linear layers to benefit from the classic quantization techniques).

vladmandic · 2024-12-21T04:04:16Z

We should also have something like exclude_modules to let the users specify the names of the models to not quant

Yup! And it can have a default value with exactly the ones you've mentioned.

github-actions · 2025-01-20T15:02:51Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

vladmandic · 2025-01-20T16:16:58Z

ping to remove stale

vladmandic · 2025-02-12T16:26:48Z

any updates on this one?
i just added lumina2 supportand quantization works.
but instead of simply using Lumina2Text2ImgPipeline.from_pretrained (or even autopipeline), need to manually pre-load transformer using Lumina2Transformer2DModel and text-encoder using transformers.AutoModel to assemble the model.

SunMarc · 2025-02-12T17:15:52Z

@sayakpaul is planning on adding this soon. Sorry for the delay

github-actions · 2025-03-09T15:04:00Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

vladmandic · 2025-03-09T15:50:21Z

ping to remove stale

sayakpaul · 2025-03-09T15:51:39Z

Definitely not stale. Will be prioritised soon.

sayakpaul · 2025-03-10T05:19:14Z

@vladmandic

from diffusers.quantizers import PipelineQuantizationConfig
from diffusers import DiffusionPipeline
import torch

quant_config = PipelineQuantizationConfig(
    quant_backend="bitsandbytes_4bit",
    quant_kwargs={
        "load_in_4bit": True,
        "bnb_4bit_quant_type": "nf4",
        "bnb_4bit_compute_dtype": torch.bfloat16
    },
    exclude_modules=["text_encoder", "vae"]
)
pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    quantization_config=quant_config,
    torch_dtype=torch.bfloat16
).to("cuda")

https://github.com/huggingface/diffusers/compare/feat/pipeline-quant-config?expand=1

sayakpaul · 2025-03-10T08:10:03Z

@vladmandic since you reacted to the above message, do feel free to provide feedback here.

vladmandic · 2025-03-10T14:16:53Z

I will as soon as I get back, traveling this week.

sayakpaul · 2025-03-12T11:06:56Z

You can also do:

from diffusers.quantizers import PipelineQuantizationConfig
from diffusers import DiffusionPipeline
import torch

quant_config = PipelineQuantizationConfig(
    mapping={
        "transformer": {
            "quant_backend": "bitsandbytes_4bit",
            "quant_kwargs": {
                "load_in_4bit": True,
                "bnb_4bit_quant_type": "nf4",
                "bnb_4bit_compute_dtype": torch.bfloat16
            }
        },
        "text_encoder_2": {
            "quant_backend": "bitsandbytes_4bit",
            "quant_kwargs": {
                "load_in_4bit": True,
                "bnb_4bit_quant_type": "nf4",
                "bnb_4bit_compute_dtype": torch.bfloat16
            }
        }
    }
)
pipe = DiffusionPipeline.from_pretrained(
    "black-forest-labs/FLUX.1-dev",
    quantization_config=quant_config,
    torch_dtype=torch.bfloat16
).to("cuda")

pipe_kwargs = {
    "prompt": "A cat holding a sign that says hello world",
    "height": 1024,
    "width": 1024,
    "guidance_scale": 3.5,
    "num_inference_steps": 50,
    "max_sequence_length": 512,
}

image = pipe(**pipe_kwargs, generator=torch.manual_seed(0),).images[0]
image.save("pipeline_quant.png")

This gives you more granular control.

vladmandic · 2025-03-17T01:52:23Z

i like the approach with PipelineQuantizationConfig allowing either general config with the ability to exclude modules or specific per-module mapping config as well.

in general, this does provide a solution and i like it.

btw, while going over new optimum-quanto support, i've noticed that
diffusers.QuantoConfig is different than transformers.utils.quantization_config.QuantoConfig so we cannot use same config to load both transformer and text-encoder (since text encoder is loaded using transformers.T5EncoderModel.from_pretrained)
i don't see any particular reason why this config objectg is different, only key thing that is missing is activations property - even if its not implemented in diffusers, it would be good to have configs compatible between diffusers and transformers

SunMarc · 2025-03-18T18:01:43Z

btw, while going over new optimum-quanto support, i've noticed that
diffusers.QuantoConfig is different than transformers.utils.quantization_config.QuantoConfig so we cannot use same config to load both transformer and text-encoder (since text encoder is loaded using transformers.T5EncoderModel.from_pretrained)
i don't see any particular reason why this config objectg is different, only key thing that is missing is activations property - even if its not implemented in diffusers, it would be good to have configs compatible between diffusers and transformers

maybe we can create a mapping to make diffusers and transformers config compatible @sayakpaul ?

sayakpaul added the quantization label Dec 21, 2024

sayakpaul assigned sayakpaul and SunMarc and unassigned sayakpaul Dec 21, 2024

github-actions bot added the stale Issues that haven't received updates label Jan 20, 2025

github-actions bot removed the stale Issues that haven't received updates label Jan 21, 2025

github-actions bot added the stale Issues that haven't received updates label Mar 9, 2025

sayakpaul removed the stale Issues that haven't received updates label Mar 10, 2025

sayakpaul mentioned this issue Mar 21, 2025

feat: pipeline-level quantization config #11130

Draft

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apply applicable `quantization_config` to model components when loading a model #10327

Apply applicable `quantization_config` to model components when loading a model #10327

vladmandic commented Dec 20, 2024 •

edited

Loading

sayakpaul commented Dec 21, 2024

vladmandic commented Dec 21, 2024

github-actions bot commented Jan 20, 2025

vladmandic commented Jan 20, 2025

vladmandic commented Feb 12, 2025 •

edited

Loading

SunMarc commented Feb 12, 2025

github-actions bot commented Mar 9, 2025

vladmandic commented Mar 9, 2025

sayakpaul commented Mar 9, 2025

sayakpaul commented Mar 10, 2025

sayakpaul commented Mar 10, 2025

vladmandic commented Mar 10, 2025

sayakpaul commented Mar 12, 2025

vladmandic commented Mar 17, 2025

SunMarc commented Mar 18, 2025

Apply applicable quantization_config to model components when loading a model #10327

Apply applicable quantization_config to model components when loading a model #10327

Comments

vladmandic commented Dec 20, 2024 • edited Loading

sayakpaul commented Dec 21, 2024

vladmandic commented Dec 21, 2024

github-actions bot commented Jan 20, 2025

vladmandic commented Jan 20, 2025

vladmandic commented Feb 12, 2025 • edited Loading

SunMarc commented Feb 12, 2025

github-actions bot commented Mar 9, 2025

vladmandic commented Mar 9, 2025

sayakpaul commented Mar 9, 2025

sayakpaul commented Mar 10, 2025

sayakpaul commented Mar 10, 2025

vladmandic commented Mar 10, 2025

sayakpaul commented Mar 12, 2025

vladmandic commented Mar 17, 2025

SunMarc commented Mar 18, 2025

Apply applicable `quantization_config` to model components when loading a model #10327

Apply applicable `quantization_config` to model components when loading a model #10327

vladmandic commented Dec 20, 2024 •

edited

Loading

vladmandic commented Feb 12, 2025 •

edited

Loading