allow models to run with a user-provided dtype map instead of a single dtype #10301

hlky · 2024-12-19T12:26:39Z

What does this PR do?

Example

import torch
from diffusers import HunyuanVideoPipeline

model_id = "tencent/HunyuanVideo"
pipe = HunyuanVideoPipeline.from_pretrained(model_id, torch_dtype={'transformer': torch.bfloat16, 'default': torch.float16}, revision="refs/pr/18")
pipe.transformer.dtype, pipe.vae.dtype

(torch.bfloat16, torch.float16)

default is used as a default dtype for components that are not specified, otherwise the current default of torch.float32 is used.

Haven't looked at from_pipe case yet and we'll need to add tests but ready for a first review in case there's something missing because it's simpler than expected.

Fixes #10108

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @DN6 @sayakpaul @yiyixuxu

…e dtype

HuggingFaceDocBuilderDev · 2024-12-19T12:35:07Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sayakpaul

Thanks! Do we not have to handle the typecasts? I think for sharded checkpoints, we might have to.

sayakpaul · 2024-12-19T12:35:05Z

src/diffusers/pipelines/pipeline_loading_utils.py

+            sub_model_dtype = (
+                torch_dtype.get(name, torch_dtype.get("_", torch.float32))
+                if isinstance(torch_dtype, dict)
+                else torch_dtype
+            )


I feel like _ might be a bit unintuitive. Better to expose full dtype maps or in case partial ones are provided we default to torch.float32 for the rest of the components.

Could be default? Considering how it will work for integrations, instead of say {'transformer': torch.bfloat16, 'text_encoder': torch.float16, 'text_encoder_2': torch.float16, 'text_encoder_3': torch.float16} for SD3 and {'transformer': torch.bfloat16, 'text_encoder': torch.float16, 'text_encoder_2': torch.float16} for Flux. Not a big issue because components can be got from cls._get_signature_types().

Yeah no strong opinions.

For now it's renamed to default to be clearer, we can remove later if its not needed.

src/diffusers/pipelines/pipeline_utils.py

hlky · 2024-12-19T12:59:32Z

Thanks for the review @sayakpaul. Will look into sharded checkpoints.

hlky · 2024-12-20T15:33:39Z

HunyuanVideo is sharded so I think it's ok.

github-actions · 2025-02-02T15:02:48Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

DN6

Would add a test to PipelineTesterMixin too.

src/diffusers/pipelines/pipeline_loading_utils.py

DN6 · 2025-02-03T18:28:04Z

src/diffusers/pipelines/pipeline_utils.py

+                    f"Expected `{list(passed_class_obj.keys())}`, got extra `torch_dtype` keys `{extra_keys_dtype}`."
+                )
+            if len(extra_keys_obj) > 0:
+                logger.warning(


I don't think we need this warning. I think the expectation of passed class objects is that their dtype is already set and if it isn't it happens at the model level where a dtype=None results in FP32 default.

src/diffusers/pipelines/pipeline_utils.py

DN6 · 2025-04-01T03:30:16Z

@hlky I think we can merge this is if we just add a test for it to PipelineTesterMixin. cc: @yiyixuxu to take a look as well.

hlky · 2025-04-01T12:31:42Z

src/diffusers/models/modeling_utils.py

+                try:
+                    safetensors.torch.save_file(shard, filepath, metadata={"format": "pt"})
+                except RuntimeError:
+                    safetensors.torch.save_model(model_to_save, filepath, metadata={"format": "pt"})


For https://github.com/huggingface/diffusers/actions/runs/14188807487/job/39748937180?pr=10301#step:6:32518

Do we know why this is erroring out for this test? With the current fix, for a sharded checkpoint, we might end up saving the entire model multiple times no?

From safetensors, it doesn't allow saving shared tensors without using save_model. Looks like this is why we're using safe_serialization=False in other tests. If it's an issue like this then the similar issue exists without, as in, we couldn't save a sharded checkpoint that has shared tensors with safetensors, IMO a safetensors problem - it should not be so opinionated about what can or cannot be saved with save_file, shared tensors are always minimal and duplicating them would make little difference to the overall size, the documentation on this matter also does not seem to align with our own findings - it mentions that buffers are consumed once we use get_tensor however we have seen that memory is held during the context of safe_open.

https://github.com/huggingface/safetensors/blob/7d5af853631628137a79341ddc5611d18a17f3fe/bindings/python/py_src/safetensors/torch.py#L481-L494

diffusers/tests/models/test_modeling_common.py

Lines 445 to 446 in d8c617c

with tempfile.TemporaryDirectory() as tmpdirname:

model.save_pretrained(tmpdirname, safe_serialization=False)

Hmm I think in this case it would be better to just skip the test for Unidiffuser (it has very low usage) than change the saving logic for all pipelines.

Took a quick look. Issue is with the text_decoder component that is a transformer PretrainedModel wrapped in a ModelMixin class. So the transformers logic for saving shared tensors is never invoked
https://github.com/huggingface/transformers/blob/ed95493ce05688447d15d9a82d2d70695290ecff/src/transformers/modeling_utils.py#L3464-L3479

We can skip the test and deprecate this pipeline later 👍🏽

#11194 - reverts the change and the test passes with safe_serialization=False.

@ic-synth

* Raise warning and round down if Wan num_frames is not 4k + 1 (huggingface#11167) * update * raise warning and round to nearest multiple of scale factor * [Docs] Fix environment variables in `installation.md` (huggingface#11179) * Add `latents_mean` and `latents_std` to `SDXLLongPromptWeightingPipeline` (huggingface#11034) * Bug fix in LTXImageToVideoPipeline.prepare_latents() when latents is already set (huggingface#10918) * Bug fix in ltx * Assume packed latents. --------- Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> * [tests] no hard-coded cuda (huggingface#11186) no cuda only * [WIP] Add Wan Video2Video (huggingface#11053) * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * map BACKEND_RESET_MAX_MEMORY_ALLOCATED to reset_peak_memory_stats on XPU (huggingface#11191) Signed-off-by: YAO Matrix <matrix.yao@intel.com> * fix autocast (huggingface#11190) Signed-off-by: jiqing-feng <jiqing.feng@intel.com> * fix: for checking mandatory and optional pipeline components (huggingface#11189) fix: optional componentes verification on load * remove unnecessary call to `F.pad` (huggingface#10620) * rewrite memory count without implicitly using dimensions by @ic-synth * replace F.pad by built-in padding in Conv3D * in-place sums to reduce memory allocations * fixed trailing whitespace * file reformatted * in-place sums * simpler in-place expressions * removed in-place sum, may affect backward propagation logic * removed in-place sum, may affect backward propagation logic * removed in-place sum, may affect backward propagation logic * reverted change * allow models to run with a user-provided dtype map instead of a single dtype (huggingface#10301) * allow models to run with a user-provided dtype map instead of a single dtype * make style * Add warning, change `_` to `default` * make style * add test * handle shared tensors * remove warning --------- Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * [tests] HunyuanDiTControlNetPipeline inference precision issue on XPU (huggingface#11197) * add xpu part * fix more cases * remove some cases * no canny * format fix * Revert `save_model` in ModelMixin save_pretrained and use safe_serialization=False in test (huggingface#11196) * [docs] `torch_dtype` map (huggingface#11194) * Fix enable_sequential_cpu_offload in CogView4Pipeline (huggingface#11195) * Fix enable_sequential_cpu_offload in CogView4Pipeline * make fix-copies * SchedulerMixin from_pretrained and ConfigMixin Self type annotation (huggingface#11192) * Update import_utils.py (huggingface#10329) added onnxruntime-vitisai for custom build onnxruntime pkg * Add CacheMixin to Wan and LTX Transformers (huggingface#11187) * update * update * update * feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline (huggingface#11188) * feat: [Community Pipeline] - FaithDiff Stable Diffusion XL Pipeline for Image SR. * added pipeline * [Model Card] standardize advanced diffusion training sdxl lora (huggingface#7615) * model card gen code * push modelcard creation * remove optional from params * add import * add use_dora check * correct lora var use in tags * make style && make quality --------- Co-authored-by: Aryan <aryan@huggingface.co> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> * Change KolorsPipeline LoRA Loader to StableDiffusion (huggingface#11198) Change LoRA Loader to StableDiffusion Replace the SDXL LoRA Loader Mixin inheritance with the StableDiffusion one * Update Style Bot workflow (huggingface#11202) update style bot workflow --------- Signed-off-by: YAO Matrix <matrix.yao@intel.com> Signed-off-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Aryan <aryan@huggingface.co> Co-authored-by: Mark <remarkablemark@users.noreply.github.com> Co-authored-by: hlky <hlky@hlky.ac> Co-authored-by: kakukakujirori <63725741+kakukakujirori@users.noreply.github.com> Co-authored-by: Dhruv Nair <dhruv.nair@gmail.com> Co-authored-by: YiYi Xu <yixu310@gmail.com> Co-authored-by: Fanli Lin <fanli.lin@intel.com> Co-authored-by: Yao Matrix <matrix.yao@intel.com> Co-authored-by: jiqing-feng <jiqing.feng@intel.com> Co-authored-by: Eliseu Silva <elismasilva@gmail.com> Co-authored-by: Bruno Magalhaes <bruno.magalhaes@synthesia.io> Co-authored-by: Sayak Paul <spsayakpaul@gmail.com> Co-authored-by: lakshay sharma <31830611+Lakshaysharma048@users.noreply.github.com> Co-authored-by: Abhipsha Das <ad6489@nyu.edu> Co-authored-by: Basile Lewandowski <basile.lewan@gmail.com> Co-authored-by: célina <hanouticelina@gmail.com>

hlky added 2 commits December 19, 2024 12:21

allow models to run with a user-provided dtype map instead of a singl…

3643246

…e dtype

make style

db77006

sayakpaul reviewed Dec 19, 2024

View reviewed changes

hlky added 3 commits December 19, 2024 14:49

Add warning, change _ to default

2c58c64

make style

2adba04

Merge branch 'main' into dtype-map

156e6db

Merge branch 'main' into dtype-map

6de7479

hlky added the roadmap Add to current release roadmap label Jan 9, 2025

github-actions bot added the stale Issues that haven't received updates label Feb 2, 2025

hlky added wip and removed stale Issues that haven't received updates labels Feb 2, 2025

Merge branch 'main' into dtype-map

e8ac2dd

DN6 reviewed Feb 3, 2025

View reviewed changes

hlky added 2 commits March 24, 2025 15:40

Merge branch 'main' into dtype-map

70ae4b6

Merge branch 'main' into dtype-map

b1237f7

hlky added 2 commits April 1, 2025 07:29

Merge remote-tracking branch 'upstream/main' into dtype-map

1dc755c

add test

ec53008

DN6 approved these changes Apr 1, 2025

View reviewed changes

hlky added 2 commits April 1, 2025 13:30

handle shared tensors

e8aa61b

remove warning

8f71311

hlky commented Apr 1, 2025

View reviewed changes

hlky merged commit d8c617c into huggingface:main Apr 2, 2025
28 of 29 checks passed

github-project-automation bot moved this from In Progress to Done in Diffusers Roadmap 0.35 Apr 2, 2025

This was referenced Apr 2, 2025

[docs] torch_dtype map #11194

Merged

[docs] Model cards #11112

Merged

	with tempfile.TemporaryDirectory() as tmpdirname:
	model.save_pretrained(tmpdirname, safe_serialization=False)

allow models to run with a user-provided dtype map instead of a single dtype #10301

allow models to run with a user-provided dtype map instead of a single dtype #10301

Uh oh!

Conversation

hlky commented Dec 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented Dec 19, 2024

Uh oh!

sayakpaul left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hlky commented Dec 19, 2024

Uh oh!

hlky commented Dec 20, 2024

Uh oh!

github-actions bot commented Feb 2, 2025

Uh oh!

DN6 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

DN6 commented Apr 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

hlky Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DN6 Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

DN6 Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

hlky commented Dec 19, 2024 •

edited

Loading

hlky Apr 2, 2025 •

edited

Loading

DN6 Apr 2, 2025 •

edited

Loading

DN6 Apr 2, 2025 •

edited

Loading