Fix quantization w/ DeepSpeed not working #32640

muellerzr · 2024-08-12T22:25:31Z

What does this PR do?

This PR tweaks the logic check introduced in #32299 to specifically exclude when Zero-3 is enabled and we're using a quantization method (which skips this chunk of the zero3 init use)

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@SunMarc can you check if what I've done here makes sense given the comment we're trying to fix?

#32299 (comment)

I believe so and I think the test makes sense, but you're quant eyes would be appreciated :)

HuggingFaceDocBuilderDev · 2024-08-12T22:47:01Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

jphme · 2024-08-13T13:05:43Z

This fixed the issue with finetuning I reported here, thanks!

SunMarc

Thanks for fixing this ! Left a comment

SunMarc · 2024-08-13T14:03:44Z

src/transformers/trainer.py

+        if (
+            not (hasattr(model, "hf_quantizer") and model.hf_quantizer.is_trainable)
+            and is_deepspeed_zero3_enabled()
+            and not getattr(model, "_transformers_zero3_init_used", True)
+        ):


What's the difference when the model is quantized vs no quantized ?

amyeroberts · 2024-08-13T14:25:13Z

For my own understanding - why doesn't the user need to create the model after TrainingArguments in the quantized case but does in the unquantized case?

muellerzr · 2024-08-13T14:33:45Z

Quantized weights aren't thrown into the zero_init context manager and instead are using init_empty_weights (I think this answers both your questions)

muellerzr · 2024-08-16T15:41:50Z

Closing since we're doing a full revert

Fix quantization

e48ea4d

muellerzr requested review from amyeroberts and SunMarc August 12, 2024 22:25

diff comment

122ff34

SunMarc approved these changes Aug 13, 2024

View reviewed changes

muellerzr closed this Aug 16, 2024

muellerzr mentioned this pull request Aug 16, 2024

Revert PR 32299, flag users when Zero-3 was missed #32851

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix quantization w/ DeepSpeed not working #32640

Fix quantization w/ DeepSpeed not working #32640

muellerzr commented Aug 12, 2024

HuggingFaceDocBuilderDev commented Aug 12, 2024

jphme commented Aug 13, 2024

SunMarc left a comment

SunMarc Aug 13, 2024

amyeroberts commented Aug 13, 2024

muellerzr commented Aug 13, 2024 •

edited

Loading

muellerzr commented Aug 16, 2024

Fix quantization w/ DeepSpeed not working #32640

Fix quantization w/ DeepSpeed not working #32640

Conversation

muellerzr commented Aug 12, 2024

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Aug 12, 2024

jphme commented Aug 13, 2024

SunMarc left a comment

Choose a reason for hiding this comment

SunMarc Aug 13, 2024

Choose a reason for hiding this comment

amyeroberts commented Aug 13, 2024

muellerzr commented Aug 13, 2024 • edited Loading

muellerzr commented Aug 16, 2024

muellerzr commented Aug 13, 2024 •

edited

Loading