Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Trainer] Fix .to call on 4bit models #24444

Merged
merged 2 commits into from
Jun 23, 2023

Conversation

younesbelkada
Copy link
Contributor

What does this PR do?

Currently the Trainer fails when calling initializing it on some scenarios using 4bit models
In fact, the device placement is correctly skipped for 8bit models but needs to be skipped as well for 4bit models as the to operation is not supported for 4bit models as well.
This PR adds a patch for that case

cc @amyeroberts @lewtun

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 23, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for fixing!

General question - as there's two flags, "is_loaded_in_8bit" and "is_quantized", are we guaranteed that all 8bit quantized (and 4 bit) have these flags correctly set?

@younesbelkada
Copy link
Contributor Author

@amyeroberts thanks!
Absolutely yes, for reference, here is how we set that attribute:

model.is_quantized = load_in_8bit or load_in_4bit

@lewtun
Copy link
Member

lewtun commented Jun 23, 2023

I can confirm that this fix resolves the error I was hitting with 4-bit models:

Traceback (most recent call last):
  File "/fsx/lewis/git/h4/scripts/evaluation/run_rm_eval.py", line 275, in <module>
    main()
  File "/fsx/lewis/git/h4/scripts/evaluation/run_rm_eval.py", line 164, in main
    trainer = RewardTrainer(
  File "/fsx/lewis/git/h4/src/h4/training/trainer.py", line 26, in __init__
    super().__init__(*args, **kwargs)
  File "/fsx/lewis/miniconda/envs/h4/lib/python3.10/site-packages/transformers/trainer.py", line 506, in __init__
    self._move_model_to_device(model, args.device)
  File "/fsx/lewis/miniconda/envs/h4/lib/python3.10/site-packages/transformers/trainer.py", line 747, in _move_model_to_device
    model = model.to(device)
  File "/fsx/lewis/miniconda/envs/h4/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1889, in to
    raise ValueError(
ValueError: `.to` is not supported for `4-bit` or `8-bit` models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.

Thanks for the fast fix @younesbelkada !!

@younesbelkada younesbelkada merged commit 468aed3 into huggingface:main Jun 23, 2023
@younesbelkada younesbelkada deleted the fix-4bit-move branch June 23, 2023 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants