-
Notifications
You must be signed in to change notification settings - Fork 580
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization: take into account meta tensor when splitting the state_dict
#2591
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
if tensor.device.type == "meta": | ||
return None | ||
else: | ||
return tensor.device, _get_unique_id(tensor), get_torch_storage_size(tensor) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with the changes you've made 👍
Could you just update the docstring just above to reflect this change? Currently it says
Multiple different tensors can share the same underlying storage. For example, "meta" tensors all share the same storage, and thus their identifier will all be equal. This identifier is guaranteed to be unique and constant for this tensor's storage during its lifetime. Two tensor storages with non-overlapping lifetimes may have the same id.
which is not true anymore (will always return None)
And thanks for looking into it in the first place!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done !
…ggingface_hub into serialization-meta-device
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Co-authored-by: Lucain <lucain@huggingface.co>
Thanks @SunMarc ! |
What does this PR do ?
Fixes huggingface/transformers#33209 cc @xenova
When a meta tensor is in the state dict, it will be assigned to the same shard file as the other meta tensor since they all share the same storage_id. Hence this creates a big file when using transformers cpu offload saving functionality.
If we have meta tensor in the
state_dict
, we should consider that they do not share the same storage. Right now, we are putting the meta tensor that have the same size in the same bucket.Not sure what's the best way to deal with that, I considered:
split_state_dict_into_shards_factory
function but that would require to add code specific to torch. See this commit.get_torch_storage_id
and return None when we have meta tensor as returning none is the default behavior ofget_storage_id
-> latest commitFailing tests are not related to this PR.
Example
I get the right number of shards + expected
total_size