Use shard saving from huggingface_hub #2795

SunMarc · 2024-05-21T16:57:38Z

What does this PR do?

This PR replaces the shard saving utility function to use the one from huggingface hub. The goal is to make it easier to maintain this piece of code that it used across many libraries such as transformers, peft and diffusers.

TODO:

Decide if we should also use id_tensor_storage from huggingface_hub or not (it is called get_storage_id there). We use it in clean_state_dict_for_safetensors to preprocess the state_dict before sharding it.
Edit: after some thoughts, it is better to just use the get_storage_id from huggingface_hub. I will use it when it becomes public.
Also, we need to wait for the release on huggingface_hub, after this PR gets merged.

This is POC for now. I need to test it a bit and also see if it is doable to perform the change in other libraries.

cc @Wauplin @sayakpaul

sayakpaul · 2024-05-22T11:00:31Z

src/accelerate/accelerator.py

+        for filename, tensors in state_dict_split.filename_to_tensors.items():
+            shard = {tensor: state_dict[tensor] for tensor in tensors}
+            self.save(shard, os.path.join(save_directory, filename), safe_serialization=safe_serialization)


Could this be a done in a threaded manner or not really so as to preserve readability?

I'll have a look !

Not sure why threading may matter here, this only happens on the main accelerate process on purpose

src/accelerate/utils/modeling.py

muellerzr

Thanks for updating us to use the 🤗 Hub!

This looks great! Made some comments (in others) on deprecation and some general advice. Otherwise looks great (once tests pass)

src/accelerate/accelerator.py

Wauplin · 2024-05-24T16:52:55Z

We use it in clean_state_dict_for_safetensors to preprocess the state_dict before sharding it.

Haven't tried locally but is clean_state_dict_for_safetensors even needed now that you use huggingface_hub's implementation? If yes, we might think of a way to clean this in split_torch_state_dict_into_shards directly? (or later, if we implement a real save_state_dict method in huggingface_hub.

Otherwise, if accelerate.id_tensor_storage and huggingface_hub.get_storage_id does the same job, better to use the huggingface_hub one. I can make it public/documented if needed.

Also, we need to wait for the release on huggingface_hub, after this huggingface/huggingface_hub#2286 gets merged.

It's merged. I can make a patch release on Monday :)

Wauplin · 2024-05-27T08:34:26Z

@SunMarc Patch release 0.23.2 has been shipped: https://github.com/huggingface/huggingface_hub/releases/tag/v0.23.2. Feel free to use it :)

SunMarc · 2024-05-27T12:26:58Z

Haven't tried locally but is clean_state_dict_for_safetensors even needed now that you use huggingface_hub's implementation. If yes, we might think of a way to clean this in split_torch_state_dict_into_shards directly? (or later, if we implement a real save_state_dict method in huggingface_hub.

Yes, we still need that. I think it will be better to include this is a real save_state_dict method in huggingface_hub (have it as a separate function). I will check if we can do something that also works for transformers library.

Otherwise, if accelerate.id_tensor_storage and huggingface_hub.get_storage_id does the same job, better to use the huggingface_hub one. I can make it public/documented if needed.

Sounds good ! However, it think that if you have a function that clean the state_dict just like here, we don't even need to make it public since we only use that function when cleaning the state dict + splitting

HuggingFaceDocBuilderDev · 2024-05-27T12:55:57Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Wauplin

Looks good to me! Thanks for integrating this :)

…om-hf-hub

muellerzr · 2024-06-07T14:03:42Z

Merging now that release has been cut (so this is not included while we test!)

use shard saving from huggingface hub

33d9f06

SunMarc requested a review from muellerzr May 21, 2024 16:58

sayakpaul reviewed May 22, 2024

View reviewed changes

src/accelerate/utils/modeling.py Outdated Show resolved Hide resolved

muellerzr approved these changes May 22, 2024

View reviewed changes

src/accelerate/accelerator.py Outdated Show resolved Hide resolved

SunMarc added 2 commits May 22, 2024 15:22

move import

1cc1a9d

add shard_checkpoint back but with deprecation msg

47c4285

add shard_checkpoint back

cc95ef4

SunMarc mentioned this pull request May 28, 2024

Use huggingface_hub helper function to split state dict huggingface/transformers#31091

Merged

Wauplin approved these changes May 29, 2024

View reviewed changes

Merge remote-tracking branch 'upstream/main' into use-shard-saving-fr…

08a1603

…om-hf-hub

muellerzr merged commit f0049b2 into huggingface:main Jun 7, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use shard saving from huggingface_hub #2795

Use shard saving from huggingface_hub #2795

SunMarc commented May 21, 2024 •

edited

Loading

sayakpaul May 22, 2024

SunMarc May 22, 2024

muellerzr May 22, 2024

muellerzr left a comment •

edited

Loading

Wauplin commented May 24, 2024 •

edited

Loading

Wauplin commented May 27, 2024

SunMarc commented May 27, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented May 27, 2024

Wauplin left a comment

muellerzr commented Jun 7, 2024

Use shard saving from huggingface_hub #2795

Use shard saving from huggingface_hub #2795

Conversation

SunMarc commented May 21, 2024 • edited Loading

What does this PR do?

sayakpaul May 22, 2024

Choose a reason for hiding this comment

SunMarc May 22, 2024

Choose a reason for hiding this comment

muellerzr May 22, 2024

Choose a reason for hiding this comment

muellerzr left a comment • edited Loading

Choose a reason for hiding this comment

Wauplin commented May 24, 2024 • edited Loading

Wauplin commented May 27, 2024

SunMarc commented May 27, 2024 • edited Loading

HuggingFaceDocBuilderDev commented May 27, 2024

Wauplin left a comment

Choose a reason for hiding this comment

muellerzr commented Jun 7, 2024

SunMarc commented May 21, 2024 •

edited

Loading

muellerzr left a comment •

edited

Loading

Wauplin commented May 24, 2024 •

edited

Loading

SunMarc commented May 27, 2024 •

edited

Loading