Serialization: take into account meta tensor when splitting the `state_dict` #2591

SunMarc · 2024-10-07T16:46:13Z

What does this PR do ?

Fixes huggingface/transformers#33209 cc @xenova

When a meta tensor is in the state dict, it will be assigned to the same shard file as the other meta tensor since they all share the same storage_id. Hence this creates a big file when using transformers cpu offload saving functionality.
If we have meta tensor in the state_dict, we should consider that they do not share the same storage. Right now, we are putting the meta tensor that have the same size in the same bucket.

Not sure what's the best way to deal with that, I considered:

modifying split_state_dict_into_shards_factory function but that would require to add code specific to torch. See this commit.
modify get_torch_storage_id and return None when we have meta tensor as returning none is the default behavior of get_storage_id -> latest commit

Failing tests are not related to this PR.

Example

I get the right number of shards + expected total_size

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "google/gemma-2-27b-it",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    max_memory={0:"40GB","cpu":"100GB"}
)
model.save_pretrained('output')
del model

new_model = AutoModelForCausalLM.from_pretrained(
    "output",
    device_map="auto",
    torch_dtype=torch.bfloat16,
    max_memory={0:"60GB"}
)
tokenizer = AutoTokenizer.from_pretrained("google/gemma-2-27b-it")

input_text = "Write me a poem about Machine Learning."
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

outputs = new_model.generate(**input_ids)
print(tokenizer.decode(outputs[0]))

HuggingFaceDocBuilderDev · 2024-10-07T16:52:43Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin · 2024-10-09T15:02:34Z

src/huggingface_hub/serialization/_torch.py

+    if tensor.device.type == "meta":
+        return None
+    else:
+        return tensor.device, _get_unique_id(tensor), get_torch_storage_size(tensor)


I'm fine with the changes you've made 👍
Could you just update the docstring just above to reflect this change? Currently it says

Multiple different tensors can share the same underlying storage. For example, "meta" tensors all share the same storage, and thus their identifier will all be equal. This identifier is guaranteed to be unique and constant for this tensor's storage during its lifetime. Two tensor storages with non-overlapping lifetimes may have the same id.

which is not true anymore (will always return None)

And thanks for looking into it in the first place!

…ggingface_hub into serialization-meta-device

Wauplin

Thanks!

src/huggingface_hub/serialization/_torch.py

Co-authored-by: Lucain <lucain@huggingface.co>

Wauplin · 2024-10-10T12:15:07Z

Thanks @SunMarc !

SunMarc added 3 commits October 7, 2024 18:37

Enable meta tensor serialization

ff504a3

getattr is better

c5a9c12

style

93afb29

SunMarc requested a review from Wauplin October 7, 2024 16:48

SunMarc and others added 2 commits October 7, 2024 19:00

skip meta tensors

8ac1145

Merge branch 'main' into serialization-meta-device

d57f813

Wauplin reviewed Oct 9, 2024

View reviewed changes

SunMarc added 2 commits October 9, 2024 18:18

update doc

09ff14c

Merge branch 'serialization-meta-device' of github.com:huggingface/hu…

cac3858

…ggingface_hub into serialization-meta-device

SunMarc requested a review from Wauplin October 9, 2024 16:20

Wauplin approved these changes Oct 9, 2024

View reviewed changes

src/huggingface_hub/serialization/_torch.py Outdated Show resolved Hide resolved

SunMarc and others added 2 commits October 9, 2024 18:26

Update src/huggingface_hub/serialization/_torch.py

ed2ba2c

Co-authored-by: Lucain <lucain@huggingface.co>

oups

0651877

Wauplin merged commit 8cb81ac into main Oct 10, 2024
19 checks passed

Wauplin deleted the serialization-meta-device branch October 10, 2024 12:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization: take into account meta tensor when splitting the `state_dict` #2591

Serialization: take into account meta tensor when splitting the `state_dict` #2591

SunMarc commented Oct 7, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Oct 7, 2024

Wauplin Oct 9, 2024

SunMarc Oct 9, 2024

Wauplin left a comment

Wauplin commented Oct 10, 2024

Serialization: take into account meta tensor when splitting the state_dict #2591

Serialization: take into account meta tensor when splitting the state_dict #2591

Conversation

SunMarc commented Oct 7, 2024 • edited Loading

What does this PR do ?

Example

HuggingFaceDocBuilderDev commented Oct 7, 2024

Wauplin Oct 9, 2024

Choose a reason for hiding this comment

SunMarc Oct 9, 2024

Choose a reason for hiding this comment

Wauplin left a comment

Choose a reason for hiding this comment

Wauplin commented Oct 10, 2024

Serialization: take into account meta tensor when splitting the `state_dict` #2591

Serialization: take into account meta tensor when splitting the `state_dict` #2591

SunMarc commented Oct 7, 2024 •

edited

Loading