Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bitsandbytes support for gpt2 models #24504

Merged
merged 5 commits into from
Jun 28, 2023

Conversation

DarioSucic
Copy link
Contributor

What does this PR do?

The current bitsandbytes integration only supports models using nn.Linear, which excludes gpt2 and other models that instead use Conv1D. This PR enables loading/serialization of these models, as well as gpt2-xl tests for int8 and 4bit.

This is achieved by transposing the weight matrices of Conv1D layers before quantization.

Note: Following the suggestion in the bnb tests to use models with >1b params only leaves gpt2-xl, which is unfortunately a 6.4GB download due to being stored in float32.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

@younesbelkada, @TimDettmers

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Jun 27, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Thanks so much for adding this support and for the clean implementation.

FYI, on my side I get these failing tests, I believe there might be a small difference between our envs. We can always update the expected sentence later in case they fail on the daily CI (which probably will be the case). Happy also to add the missing test in a follow up PR.

Screenshot 2023-06-27 at 16 08 55

Also one test is failing for 4bit:

FAILED tests/bnb/test_4bit.py::Bnb4BitGPT2Test::test_memory_footprint - AttributeError: 'GPT2MLP' object has no attribute 'dense_4h_to_h'

Could quickly address a fix? 🙏 After that we should be ready to merge

tests/bnb/test_mixed_int8.py Outdated Show resolved Hide resolved
module.bias is not None,
quantization_config.bnb_4bit_compute_dtype,
compress_statistics=quantization_config.bnb_4bit_use_double_quant,
quant_type=quantization_config.bnb_4bit_quant_type,
)
has_been_replaced = True
# Store the module class in case we need to transpose the weight later
model._modules[name].source_cls = type(module)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice!

@DarioSucic
Copy link
Contributor Author

FYI, on my side I get these failing tests, I believe there might be a small difference between our envs. We can always update the expected sentence later in case they fail on the daily CI (which probably will be the case). Happy also to add the missing test in a follow up PR.
Screenshot 2023-06-27 at 16 08 55

Aha, it's been stable for me so far, but I can see that happening. If it's any help I'm running this on an RTX 4090 and torch==2.1.0.dev20230603+cu121.

Also one test is failing for 4bit:

FAILED tests/bnb/test_4bit.py::Bnb4BitGPT2Test::test_memory_footprint - AttributeError: 'GPT2MLP' object has no attribute 'dense_4h_to_h'

Could quickly address a fix? 🙏 After that we should be ready to merge

Nice catch! I have a fix in mind that should also remove most of the int8 test code I added, so I'll get that in asap.

@DarioSucic DarioSucic requested a review from younesbelkada June 27, 2023 16:32
Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great ! Thanks for this great addition and adding Conv1d support to bnb quantization !
cc @SunMarc for your information, that might be of your interest :D

@younesbelkada younesbelkada requested a review from sgugger June 27, 2023 21:36
Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>
Copy link
Collaborator

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding support for this~!

@younesbelkada younesbelkada merged commit 1224092 into huggingface:main Jun 28, 2023
amyeroberts pushed a commit that referenced this pull request Jul 7, 2023
* feat: Add `_build_conversation_input_ids` to GPT-SW3 tokenizer, adjust line length

* feat: Merge in PR #24504.

This allows the GPT-SW3 models (and other GPT-2 based models) to be 4-bit quantised
using `load_in_4bit` with `bitsandbytes`.

* fix: F-string

* fix: F-string

* fix: Remove EOS token from all responses

* fix: Remove redundant newlines

* feat: Add `load_in_4bit` to `Pipeline`

* fix: Separate turns with `\n<s>\n` rather than `<s>`

* fix: Add missing newline in prompt

* tests: Add unit tests for the new `_build_conversation_input_ids` method

* style: Automatic style correction

* tests: Compare encodings rather than decodings

* fix: Remove `load_in_4bit` from pipeline arguments

* docs: Add description and references of the GPT-SW3 chat format

* style: Line breaks

* Apply suggestions from code review

Fix Conversation type hints

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

* fix: Import TYPE_CHECKING

* style: Run automatic fixes

* tests: Remove `_build_conversation_input_ids` unit tests

* tests: Remove import of `Conversation` in GPT-SW3 unit test

* style: Revert formatting

* style: Move TYPE_CHECKING line after all imports

* style: Imports order

* fix: Change prompt to ensure that `sp_model.encode` and `encode` yields same result

* docs: Add TODO comment related to the addition of whitespace during decoding

* style: Automatic style checks

* fix: Remove final whitespace in prompt, as prefix whitespace is used by sentencepiece

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants