-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bitsandbytes support for gpt2 models #24504
Conversation
The documentation is not available anymore as the PR was closed or merged. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! Thanks so much for adding this support and for the clean implementation.
FYI, on my side I get these failing tests, I believe there might be a small difference between our envs. We can always update the expected sentence later in case they fail on the daily CI (which probably will be the case). Happy also to add the missing test in a follow up PR.
Also one test is failing for 4bit:
FAILED tests/bnb/test_4bit.py::Bnb4BitGPT2Test::test_memory_footprint - AttributeError: 'GPT2MLP' object has no attribute 'dense_4h_to_h'
Could quickly address a fix? 🙏 After that we should be ready to merge
module.bias is not None, | ||
quantization_config.bnb_4bit_compute_dtype, | ||
compress_statistics=quantization_config.bnb_4bit_use_double_quant, | ||
quant_type=quantization_config.bnb_4bit_quant_type, | ||
) | ||
has_been_replaced = True | ||
# Store the module class in case we need to transpose the weight later | ||
model._modules[name].source_cls = type(module) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great ! Thanks for this great addition and adding Conv1d support to bnb quantization !
cc @SunMarc for your information, that might be of your interest :D
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding support for this~!
* feat: Add `_build_conversation_input_ids` to GPT-SW3 tokenizer, adjust line length * feat: Merge in PR #24504. This allows the GPT-SW3 models (and other GPT-2 based models) to be 4-bit quantised using `load_in_4bit` with `bitsandbytes`. * fix: F-string * fix: F-string * fix: Remove EOS token from all responses * fix: Remove redundant newlines * feat: Add `load_in_4bit` to `Pipeline` * fix: Separate turns with `\n<s>\n` rather than `<s>` * fix: Add missing newline in prompt * tests: Add unit tests for the new `_build_conversation_input_ids` method * style: Automatic style correction * tests: Compare encodings rather than decodings * fix: Remove `load_in_4bit` from pipeline arguments * docs: Add description and references of the GPT-SW3 chat format * style: Line breaks * Apply suggestions from code review Fix Conversation type hints Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: Import TYPE_CHECKING * style: Run automatic fixes * tests: Remove `_build_conversation_input_ids` unit tests * tests: Remove import of `Conversation` in GPT-SW3 unit test * style: Revert formatting * style: Move TYPE_CHECKING line after all imports * style: Imports order * fix: Change prompt to ensure that `sp_model.encode` and `encode` yields same result * docs: Add TODO comment related to the addition of whitespace during decoding * style: Automatic style checks * fix: Remove final whitespace in prompt, as prefix whitespace is used by sentencepiece --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
What does this PR do?
The current bitsandbytes integration only supports models using nn.Linear, which excludes gpt2 and other models that instead use Conv1D. This PR enables loading/serialization of these models, as well as gpt2-xl tests for int8 and 4bit.
This is achieved by transposing the weight matrices of Conv1D layers before quantization.
Note: Following the suggestion in the bnb tests to use models with >1b params only leaves gpt2-xl, which is unfortunately a 6.4GB download due to being stored in float32.
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@younesbelkada, @TimDettmers