Honor model dtype in `load_checkpoint` #920

sgugger · 2022-12-13T20:03:34Z

This PR fixes a standing bug where we have a different behavior than PyTorch. In torch, loading a state_dict inside a model will never change the model's dtype:

import torch

model = torch.nn.Linear(5, 6)
model_in_half = model.half()
state_dict_in_half = model.state_dict()

model = torch.nn.Linear(5, 6).to("meta")
model.load_state_dict(state_dict_in_half)
model.weight.dtype
# return torch.float32

Currently in Accelerate, load_checkpoint does the opposite and when loading a model, it converts it to the dtype of the state dict. This PR addresses that.

This PR only contains the fix for now, we have to discuss how to maybe maintain backward compatibility (even if this is a bug fix), because diffusers might be relying on this behavior, cc @patrickvonplaten

HuggingFaceDocBuilderDev · 2022-12-13T20:08:10Z

The documentation is not available anymore as the PR was closed or merged.

muellerzr

Thanks, that makes sense. I don't particularly think there is a "harm" in silently pushing it out (i.e. don't advertise the bad behavior but let it still pass) in this particular case. If we do care about phasing that out perhaps leave it for a 1.0.0? (Similar to some optimizer bits we have)

patrickvonplaten · 2022-12-16T12:11:03Z

Actually before merging, could it maybe be better to handle this in set_module_tensor_to_device ? E.g. add a dtype argument to the function there? This would be easier for diffusers to by in line with accelerate I think - see: https://github.com/huggingface/diffusers/blob/727434c206f6c22b746e460293035a1324f0bc13/src/diffusers/modeling_utils.py#L491

patrickvonplaten · 2022-12-16T12:12:30Z

src/accelerate/utils/modeling.py

+                            break
+
+                    if old_param is not None:
+                        param = param.to(old_param.dtype)


should this not be better done in set_module_tensor_to_device ? Or maybe additionally add a torch_dtype arg to set_module_tensor_to_device that handles the param correctly if value=param is used?

sgugger · 2022-12-19T09:35:53Z

src/accelerate/utils/modeling.py

@@ -680,8 +694,7 @@ def load_checkpoint_in_model(
        else:
            for param_name, param in checkpoint.items():
                module_name = param_name
-                if dtype is not None and not str(param.dtype).startswith(("torch.uint", "torch.int", "torch.bool")):


This is moved to set_module_tensor_to_device.

patrickvonplaten

Thanks a lot for adapting!

- After #285, `load_pretrained_block()` uses `accelerate.utils.set_module_tensor_to_device()` - In accelerate>=0.16.0, it saves the tensor in the dtype previously used by the model instead of dtype of the weights (huggingface/accelerate#920) - Because of that, blocks and attention caches used float32, which caused OOMs - This PR makes `load_pretrained_block()` respect `torch_dtype` (default: `"auto"`, which means reading `torch_dtype` from `config.json`)

Honor model dtype in

5fe8d02

sgugger requested a review from muellerzr December 13, 2022 20:03

muellerzr approved these changes Dec 13, 2022

View reviewed changes

patrickvonplaten mentioned this pull request Dec 16, 2022

[Dtype] Align dtype casting behavior with Transformers and Accelerate huggingface/diffusers#1725

Merged

patrickvonplaten reviewed Dec 16, 2022

View reviewed changes

Move dtype logic to set_module_tensor_to_device

3311ce6

sgugger requested review from patrickvonplaten and muellerzr December 19, 2022 09:34

sgugger commented Dec 19, 2022

View reviewed changes

patrickvonplaten approved these changes Dec 20, 2022

View reviewed changes

sgugger merged commit aa53327 into main Dec 20, 2022

sgugger deleted the honor_model_dtype branch December 20, 2022 07:48

patil-suraj mentioned this pull request Dec 20, 2022

Fix num images per prompt unclip huggingface/diffusers#1787

Merged

younesbelkada mentioned this pull request Dec 22, 2022

[ T5] fix fp16 loading issue huggingface/transformers#20878

Merged

sgugger mentioned this pull request Jan 18, 2023

Fix test for converting tensor to proper dtype #983

Merged

borzunov mentioned this pull request Apr 25, 2023

Fix OOMs happening in case of accelerate >= 0.16.0 bigscience-workshop/petals#310

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Honor model dtype in `load_checkpoint` #920

Honor model dtype in `load_checkpoint` #920

sgugger commented Dec 13, 2022

HuggingFaceDocBuilderDev commented Dec 13, 2022 •

edited

Loading

muellerzr left a comment

patrickvonplaten commented Dec 16, 2022 •

edited

Loading

patrickvonplaten Dec 16, 2022 •

edited

Loading

sgugger Dec 19, 2022

patrickvonplaten left a comment

Honor model dtype in load_checkpoint #920

Honor model dtype in load_checkpoint #920

Conversation

sgugger commented Dec 13, 2022

HuggingFaceDocBuilderDev commented Dec 13, 2022 • edited Loading

muellerzr left a comment

Choose a reason for hiding this comment

patrickvonplaten commented Dec 16, 2022 • edited Loading

patrickvonplaten Dec 16, 2022 • edited Loading

Choose a reason for hiding this comment

sgugger Dec 19, 2022

Choose a reason for hiding this comment

patrickvonplaten left a comment

Choose a reason for hiding this comment

Honor model dtype in `load_checkpoint` #920

Honor model dtype in `load_checkpoint` #920

HuggingFaceDocBuilderDev commented Dec 13, 2022 •

edited

Loading

patrickvonplaten commented Dec 16, 2022 •

edited

Loading

patrickvonplaten Dec 16, 2022 •

edited

Loading