-
Notifications
You must be signed in to change notification settings - Fork 27.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for bitsandbytes #15622
Add support for bitsandbytes #15622
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for working on this! I've left a couple of comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work, @manuelciosici!
Let's add an actual test to it before merging this.
This requirement would be a problem in some use-cases e.g. with Deepspeed ZeRO-3 which pre-loads the model directly on GPU during I'm not sure about other cases where a model ends up on GPU - if I'm not mistaken DS is the only one, @sgugger ?
Commented here: |
Normally, when creating the optimizer, the model has been moved to the proper device already, except in the following cases:
|
so this is not an exception, as it's not on cpu, we just don't do it in the Trainer, but the modeling code does.
yes, except deepspeed zero3 where it's already moved to gpu - we just don't do it in trainer.
Check - but it's irrelevant to the optimizer. So to summarize Sylvain's list of exceptions - in the general case the model should be already on GPU. So we need to wait for Tim to let us know if that's a problem or whether it has a work around. |
@TimDettmers, if you get a chance could you please address some of the questions to you so that this PR can be unblocked and BNB integration added to the HF Trainer? Thank you! |
Co-authored-by: Stas Bekman <stas00@users.noreply.github.com>
The new implementation of the override no longer depends on when the model is transferred to the GPU or when the override is registered. It takes the following signature: GlobalOptimManager.get_instance().register_module_override(module, 'weight', {'optim_bits': 32}) where |
These issues should be resolved with the new parameter override which is independent of when the parameters are transferred to the device. |
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great now.
Thank you for working on this, @manuelciosici!
And thank you @TimDettmers for supporting the sorting out process!
Let's ask @sgugger to have another look before we merge this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the work on this. It's almost ready to be merged, I just have a small request to replace everywhere is_bnb_available
by is_bitsandbytes_available
. Since we have a lot of thos eis_xxx_available
and not all contributors might know this library, it will make it clearer to everyone what this is :-)
Just waiting for @sgugger to have one last look after moving the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, thanks again for all the work on this!
* Add initial BNB integration * fixup! Add initial BNB integration * Add bnb test decorator * Update Adamw8bit option name * Use the full bnb package name * Overide bnb for all embedding layers * Fix package name * Formatting * Remove unnecessary import * Update src/transformers/trainer.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> * Rename AdamwBNB optimizer option * Add training test checking that bnb memory utilization is lower * fix merge * fix merge; fix + extend new test * cleanup * expand bnb * move all require_* candidates to testing_utils.py Co-authored-by: Stas Bekman <stas00@users.noreply.github.com> Co-authored-by: Stas Bekman <stas@stason.org>
Hi there!
This test should be fixed in #18584 because of a very small typo, for the second test
I suspected it has been never run on our side since it is the only test that requires |
Thank you, @younesbelkada @ydshieh, do you think it'd be OK to add the installation is just cuda-version specific:
https://github.com/facebookresearch/bitsandbytes#requirements--installation |
@stas00 I could add it and see how things go. But @younesbelkada added it to the scheduled CI (which means to run on GPU) with
I am a bit confused by why there was no |
Thanks @stas00 and @ydshieh ! pip install bitsandbytes should be sufficient for now (I have to update the Dockerfile though) |
Here is the repo we have to refer to: https://github.com/TimDettmers/bitsandbytes |
oh, ok, I missed that you already added it, nothing to do then. @TimDettmers, would it be possible to archive the original repo and post a link to the new repo on top of its README, since otherwise users will have no idea to use the new repo instead. thank you! |
Also note that we are linking to the old repo:
@TimDettmers, should we fix those to point to the new repo instead? |
Hi @younesbelkada Are you running inside docker container on a VM similar to CI runners (Nivida T4)? |
Hi @ydshieh ! I am running on a VM similar to CI runners, let me re try to reproduce as you suggested |
The test is passing on my VM.. for the VM I get:
But I re-ran the test with EDIT: I saw that even on multi-gpu the test was failing on the docker container |
In a single GPU setup:
|
What does this PR do?
Fixes #14819
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.
@stas00 @sgugger @TimDettmers
Status
bitsandbytes
Embedding
layers? It seems to work fine forRoBERTa
and forGPT-2
.run_mlm.py
andrun_clm.py
from the examples directory to check that the code runs. Using RTX A6000 GPUs, I see21040MiB / 49140MiB
21042MiB / 49140MiB
36906MiB / 49140MiB