Extend save_pretrained to offloaded models #27412

blbadger · 2023-11-09T17:47:52Z

What does this PR do?

Fixes #20072 and addresses the second part of huggingface/peft#868

Models with offloaded weights are currently incompatible with save_pretrained. This PR allows large models that are loaded onto the gpu and cpu to be saved, which is particularly useful for big models that have undergone merging and unloading via huggingface/peft#1063.

The implementation is to iterate through modules and onload parameters to the execution device (typically gpu) before sending the appropriate elements of the state dict to the cpu in-place, where the final state dictionary is assembled and saved.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
[ x] Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Still working on the tests (some small models are not compatible with offloading due to architectural considerations) but am happy to submit a colab version with a large model in the meantime:)

Who can review?

Anyone!
@pacman100

added hidden subset

debugged hidden subset contrastive search

added contrastive search compression

debugged compressed contrastive search

memory reduction for contrastive search

debugged mem red

added low memory option feature

Master

debugged low mem

added low mem cache

fixed 2047 tensor view

debugged 2042 past key val inputs

reformatted tensors

changed low mem output

SunMarc

Thanks for all for work @blbadger and sorry for not being active ! I tested the PR locally on a big model and it works great. The RAM usage is just as I expected. I left a few comments from testing this PR. I think we can merge soon ! ( We need to do a patch or a release for accelerate before that ) LMK if you want to finish the PR. Otherwise, I can do it.

SunMarc · 2024-05-27T16:04:59Z

tests/test_modeling_utils.py

+            "transformer.wte": 0,
+            "transformer.wpe": 0,
+            "transformer.h.0": "cpu",
+            "transformer.h.1": "cpu",
+            "transformer.h.2": "cpu",
+            "transformer.h.3": "disk",
+            "transformer.h.4": "disk",
+            "transformer.ln_f": 0,
+            "lm_head": 0,


Could you make it device_agnostic just like the tests above ? You need to pull the latest changes !

SunMarc · 2024-05-27T16:06:14Z

tests/test_modeling_utils.py

+        model_id = "hf-internal-testing/tiny-random-gpt2"
+        onloaded_model = AutoModelForCausalLM.from_pretrained(model_id, device_map="cpu")
+        tokenizer = AutoTokenizer.from_pretrained(model_id)
+        input_tokens = tokenizer.encode("Four score and seven years ago", return_tensors="pt")


Let's use the same output as other tests: inputs = torch.tensor([[1, 2, 3]]).to(0)

SunMarc · 2024-05-27T16:11:44Z

tests/test_modeling_utils.py

+        self.assertTrue(
+            postsaved_memory - presaved_memory < 7e5
+        )  # shard size (2e5) plus buffer (~4e5), will fail if shard is too large


Let's remove this assert. The tests will be too flaky for our CI. I tested the PR and it works pretty well for big models. Not sure this can capture the fact that we won't use more than shard_size for the ram when loading offloaded modules.

SunMarc · 2024-05-27T16:14:02Z

src/transformers/modeling_utils.py

        # Save the model
        if state_dict is None:
+            # if any model parameters are offloaded to the disk, make module map
+            if hasattr(self, "hf_device_map") and "disk" in self.hf_device_map.values():


You forgot to include the case where "cpu" is in self.hf_device_map.values() also. You can use the following check to see if we have offloading: if isinstance(device_map, dict) and ("cpu" in device_map.values() or "disk" in device_map.values()):. If you replace in the test the "disk" value by "cpu", the test will fail.

src/transformers/modeling_utils.py

SunMarc · 2024-05-27T16:19:42Z

src/transformers/modeling_utils.py

@@ -117,6 +117,7 @@
        save_offload_index,
        set_module_tensor_to_device,
    )
+    from accelerate.utils.modeling import get_state_dict_from_offload


You need to protect the import since we don't force the users to download the latest version of accelerate. You can see how we can do here.

SunMarc · 2024-05-27T16:21:41Z

src/transformers/modeling_utils.py

+            # remake shard with onloaded parameters if necessary
+            if module_map:
+                # init state_dict for this shard
+                state_dict = {name: "" for name in shard}
+                for module_name in state_dict.keys():
+                    module = module_map[module_name]
+                    # update state dict with onloaded parameters
+                    state_dict = get_state_dict_from_offload(module, module_name, state_dict)
+
+                # assign shard to be the completed state dict
+                shard = state_dict
+                del state_dict
+                gc.collect()
+


Could you add a check to see if the users indeed have the latest version of accelerate ? See an similar example here.

blbadger · 2024-05-30T04:26:04Z

@SunMarc thanks very much for taking a look! No worries, I have been very busy too and would not have had much time to work on this before now anyways. I will plan make time to go through your suggestions tomorrow and will let you know if I can't make the finishing touches myself, in which case you would be more than welcome to do so

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

blbadger · 2024-05-31T03:47:39Z

@SunMarc went ahead and implemented your suggestions, thanks! I was not sure which version of accelerate we should check for (the current version if we merge PR #2619 asap or the next version?) however.

SunMarc

Thanks for these iterations @blbadger. Just a few nits concerning the accelerate version. I've merged the PR on accelerate side and we should release a new version this week.

src/transformers/modeling_utils.py

HuggingFaceDocBuilderDev · 2024-06-04T09:54:12Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

muellerzr

Thanks! This indeed looks great. cc @amyeroberts for a final look

amyeroberts

Thanks for enabling this - it'll be great to have this feature!

amyeroberts · 2024-06-06T11:17:52Z

tests/test_modeling_utils.py

+            postsaved_output = saved_model(inputs)[0]
+
+        self.assertTrue(torch.allclose(cpu_output, presaved_output, atol=1e-4))
+        self.assertTrue(torch.allclose(presaved_output, postsaved_output))


Very nice :)

src/transformers/modeling_utils.py

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

SunMarc · 2024-06-07T12:39:39Z

Thank you again @blbadger for your patience and your work ! I really appreciate your contribution 🔥 Congrats on merging this amazing feature !

blbadger · 2024-06-07T22:45:48Z

Happy to contribute! Thanks very much @SunMarc for shepherding this through and @amyeroberts @muellerzr @ArthurZucker for your reviews.

* added hidden subset * debugged hidden subset contrastive search * added contrastive search compression * debugged compressed contrastive search * memory reduction for contrastive search * debugged mem red * added low memory option feature * debugged mem optmimization output stack * debugged mem optmimization output stack * debugged low mem * added low mem cache * fixed 2047 tensor view * debugged 2042 past key val inputs * reformatted tensors * changed low mem output * final clean * removed subset hidden csearch * fixed hidden device * fixed hidden device * changed compressor dtype * removed hstate compression * integrated csearch in generate * test csearch integration into generation exit() * fixed csearch kwarg integration with generation * final wrap and added doc * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * added debug print * direct hstate cat * direct hstate cat * direct hstate cat debug * direct hstate cat debug * expanded full hidden state stack * expanded full hidden state stack * matched dims for hstates * matched dims for hstates * logits fix * equality test * equality hidden debug * debug * added prints for debug * added prints for debug * equality check * switched squeeze dim * input format debug * tracing top_k_ids * removed trace * added test context * added jitter * added jitter * added jitter * returned state * rebuilt past key value reconstruction * debugged * cleaned traces * added selection for pkv * changed output to dict * cleaned * cleaned * cleaned up contrastive search test * moved low_memory kwarg * debugged * changed low mem test batch size to 1 * removed output * debugged test input shape * reformatted csearch test * added trace * removed unsqueeze on final forward pass * replaced unsqueeze with view * removed traces * cleaned * debugged model kwargs * removed special models from test * ran make quality * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * Update src/transformers/generation/configuration_utils.py Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> * refactored * refactored * refactored * make fixup * renamed flag sequential * renamed flag sequential * iterative onloading * black style and test utils * added traces for integrated test * debugged * added traces * make style * removed traces, make style * included suggestions and added test * debugged test * added offload module check and make style * is_accelerate_available and make style * added test decorator * changed test model and config spec * added offload condition * added lazy loading for each shard * debugged * modified sharding * debugged * added traces * removed safe serialization * no index overload; * trace on safe save ptrs * added ptr condition * debugged * debugged ptr * moved module map init * remake shard only for offloaded modules * refactored * debugged * refactored * debugged * cleaned and make style * cleaned and make style * added trace * sparse module map * debugged * removed module map conditional * refactored * debug * debugged * added traces * added shard mem trace * added shard mem trace * removed underlying storage check * refactored * memory leak removal and make style * cleaned * swapped test decs and make style * added mem checks and make style * added free mem warning * implemented some suggestions * moved onloading to accelerate * refactored for accelerate integration * cleaned test * make style * debugged offload map name * cleaned and make style * replaced meta device check for sharding * cleaned and make style * implemented some suggestions * more suggestions * update warning Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * more suggestions * make style * new make style * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> * Update src/transformers/modeling_utils.py Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com> --------- Co-authored-by: Joao Gante <joaofranciscocardosogante@gmail.com> Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com> Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

ArthurZucker

Final version looks nice and simple thanks all for your hardwork!

blbadger and others added 30 commits June 6, 2023 15:14

added hidden subset

e9c05dc

Merge pull request #1 from blbadger/master

9c4b96a

added hidden subset

debugged hidden subset contrastive search

fc52166

Merge pull request #2 from blbadger/master

f16ac72

debugged hidden subset contrastive search

added contrastive search compression

3906d60

Merge pull request #3 from blbadger/master

40ebe76

added contrastive search compression

debugged compressed contrastive search

2881aef

Merge pull request #4 from blbadger/master

4ddf45b

debugged compressed contrastive search

memory reduction for contrastive search

7d29c55

Merge pull request #5 from blbadger/master

b0b98cb

memory reduction for contrastive search

debugged mem red

57dfaac

Merge pull request #6 from blbadger/master

a419245

debugged mem red

added low memory option feature

fd0e19f

Merge pull request #7 from blbadger/master

fc03ab2

added low memory option feature

debugged mem optmimization output stack

802cfd4

debugged mem optmimization output stack

0632f06

Merge pull request #8 from blbadger/master

8318968

Master

debugged low mem

9bad256

Merge pull request #9 from blbadger/master

8fa1731

debugged low mem

added low mem cache

a89bb8e

Merge pull request #10 from blbadger/master

cdbd070

added low mem cache

fixed 2047 tensor view

f90f948

Merge pull request #11 from blbadger/master

65feec9

fixed 2047 tensor view

debugged 2042 past key val inputs

e1718c3

Merge pull request #12 from blbadger/master

089a299

debugged 2042 past key val inputs

reformatted tensors

3fd54e6

Merge pull request #13 from blbadger/master

6d6ac75

reformatted tensors

changed low mem output

12d5aea

Merge pull request #14 from blbadger/master

89f9b13

changed low mem output

final clean

44a9ec4

SunMarc reviewed May 27, 2024

View reviewed changes

blbadger and others added 7 commits May 30, 2024 17:39

Merge branch 'huggingface:main' into extended-save

51dd1c2

implemented some suggestions

1074348

more suggestions

38802a2

update warning

74967d8

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

more suggestions

ec37d6d

make style

520b19e

new make style

7eaa268

SunMarc approved these changes Jun 4, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

SunMarc requested a review from ArthurZucker June 4, 2024 09:44

blbadger and others added 3 commits June 4, 2024 17:14

Update src/transformers/modeling_utils.py

f3a4e30

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

Update src/transformers/modeling_utils.py

5a79199

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

Update src/transformers/modeling_utils.py

1946994

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>

muellerzr approved these changes Jun 5, 2024

View reviewed changes

muellerzr requested a review from amyeroberts June 5, 2024 13:15

amyeroberts approved these changes Jun 6, 2024

View reviewed changes

Update src/transformers/modeling_utils.py

0d22bdc

Co-authored-by: amyeroberts <22614925+amyeroberts@users.noreply.github.com>

muellerzr merged commit ff689f5 into huggingface:main Jun 7, 2024
21 checks passed

ArthurZucker reviewed Jun 18, 2024

View reviewed changes

blbadger deleted the extended-save branch September 4, 2024 15:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extend save_pretrained to offloaded models #27412

Extend save_pretrained to offloaded models #27412

blbadger commented Nov 9, 2023

SunMarc left a comment •

edited

Loading

SunMarc May 27, 2024

SunMarc May 27, 2024

SunMarc May 27, 2024

SunMarc May 27, 2024 •

edited

Loading

SunMarc May 27, 2024

SunMarc May 27, 2024

blbadger commented May 30, 2024

blbadger commented May 31, 2024

SunMarc left a comment

HuggingFaceDocBuilderDev commented Jun 4, 2024

muellerzr left a comment

amyeroberts left a comment

amyeroberts Jun 6, 2024

SunMarc commented Jun 7, 2024 •

edited

Loading

blbadger commented Jun 7, 2024

ArthurZucker left a comment

Extend save_pretrained to offloaded models #27412

Extend save_pretrained to offloaded models #27412

Conversation

blbadger commented Nov 9, 2023

What does this PR do?

Before submitting

Who can review?

SunMarc left a comment • edited Loading

Choose a reason for hiding this comment

SunMarc May 27, 2024

Choose a reason for hiding this comment

SunMarc May 27, 2024

Choose a reason for hiding this comment

SunMarc May 27, 2024

Choose a reason for hiding this comment

SunMarc May 27, 2024 • edited Loading

Choose a reason for hiding this comment

SunMarc May 27, 2024

Choose a reason for hiding this comment

SunMarc May 27, 2024

Choose a reason for hiding this comment

blbadger commented May 30, 2024

blbadger commented May 31, 2024

SunMarc left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 4, 2024

muellerzr left a comment

Choose a reason for hiding this comment

amyeroberts left a comment

Choose a reason for hiding this comment

amyeroberts Jun 6, 2024

Choose a reason for hiding this comment

SunMarc commented Jun 7, 2024 • edited Loading

blbadger commented Jun 7, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

SunMarc left a comment •

edited

Loading

SunMarc May 27, 2024 •

edited

Loading

SunMarc commented Jun 7, 2024 •

edited

Loading