Add bitsandbytes support for gpt2 models #24504

DarioSucic · 2023-06-26T20:29:24Z

What does this PR do?

The current bitsandbytes integration only supports models using nn.Linear, which excludes gpt2 and other models that instead use Conv1D. This PR enables loading/serialization of these models, as well as gpt2-xl tests for int8 and 4bit.

This is achieved by transposing the weight matrices of Conv1D layers before quantization.

Note: Following the suggestion in the bnb tests to use models with >1b params only leaves gpt2-xl, which is unfortunately a 6.4GB download due to being stored in float32.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@younesbelkada, @TimDettmers

HuggingFaceDocBuilderDev · 2023-06-27T14:11:43Z

The documentation is not available anymore as the PR was closed or merged.

younesbelkada

This looks great! Thanks so much for adding this support and for the clean implementation.

FYI, on my side I get these failing tests, I believe there might be a small difference between our envs. We can always update the expected sentence later in case they fail on the daily CI (which probably will be the case). Happy also to add the missing test in a follow up PR.

Also one test is failing for 4bit:

FAILED tests/bnb/test_4bit.py::Bnb4BitGPT2Test::test_memory_footprint - AttributeError: 'GPT2MLP' object has no attribute 'dense_4h_to_h'

Could quickly address a fix? 🙏 After that we should be ready to merge

tests/bnb/test_mixed_int8.py

younesbelkada · 2023-06-27T14:08:24Z

src/transformers/utils/bitsandbytes.py

                                module.bias is not None,
                                quantization_config.bnb_4bit_compute_dtype,
                                compress_statistics=quantization_config.bnb_4bit_use_double_quant,
                                quant_type=quantization_config.bnb_4bit_quant_type,
                            )
                            has_been_replaced = True
+                    # Store the module class in case we need to transpose the weight later
+                    model._modules[name].source_cls = type(module)


DarioSucic · 2023-06-27T15:05:59Z

FYI, on my side I get these failing tests, I believe there might be a small difference between our envs. We can always update the expected sentence later in case they fail on the daily CI (which probably will be the case). Happy also to add the missing test in a follow up PR.

Aha, it's been stable for me so far, but I can see that happening. If it's any help I'm running this on an RTX 4090 and torch==2.1.0.dev20230603+cu121.

Also one test is failing for 4bit:
FAILED tests/bnb/test_4bit.py::Bnb4BitGPT2Test::test_memory_footprint - AttributeError: 'GPT2MLP' object has no attribute 'dense_4h_to_h'
Could quickly address a fix? 🙏 After that we should be ready to merge

Nice catch! I have a fix in mind that should also remove most of the int8 test code I added, so I'll get that in asap.

younesbelkada

Looking great ! Thanks for this great addition and adding Conv1d support to bnb quantization !
cc @SunMarc for your information, that might be of your interest :D

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

sgugger

Thanks for adding support for this~!

* feat: Add `_build_conversation_input_ids` to GPT-SW3 tokenizer, adjust line length * feat: Merge in PR #24504. This allows the GPT-SW3 models (and other GPT-2 based models) to be 4-bit quantised using `load_in_4bit` with `bitsandbytes`. * fix: F-string * fix: F-string * fix: Remove EOS token from all responses * fix: Remove redundant newlines * feat: Add `load_in_4bit` to `Pipeline` * fix: Separate turns with `\n<s>\n` rather than `<s>` * fix: Add missing newline in prompt * tests: Add unit tests for the new `_build_conversation_input_ids` method * style: Automatic style correction * tests: Compare encodings rather than decodings * fix: Remove `load_in_4bit` from pipeline arguments * docs: Add description and references of the GPT-SW3 chat format * style: Line breaks * Apply suggestions from code review Fix Conversation type hints Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fix: Import TYPE_CHECKING * style: Run automatic fixes * tests: Remove `_build_conversation_input_ids` unit tests * tests: Remove import of `Conversation` in GPT-SW3 unit test * style: Revert formatting * style: Move TYPE_CHECKING line after all imports * style: Imports order * fix: Change prompt to ensure that `sp_model.encode` and `encode` yields same result * docs: Add TODO comment related to the addition of whitespace during decoding * style: Automatic style checks * fix: Remove final whitespace in prompt, as prefix whitespace is used by sentencepiece --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

DarioSucic added 3 commits June 26, 2023 21:50

Add bitsandbytes support for gpt2 models

1e64acb

Guard Conv1D import to pass tensorflow test

1d5679a

Appease ruff linter

82399a0

younesbelkada reviewed Jun 27, 2023

View reviewed changes

Fix 4bit test and remove int8 test boilerplate

b0ffa62

DarioSucic requested a review from younesbelkada June 27, 2023 16:32

younesbelkada approved these changes Jun 27, 2023

View reviewed changes

younesbelkada requested a review from sgugger June 27, 2023 21:36

Update tests/bnb/test_mixed_int8.py

16dc7c4

Co-authored-by: Younes Belkada <49240599+younesbelkada@users.noreply.github.com>

sgugger approved these changes Jun 28, 2023

View reviewed changes

younesbelkada merged commit 1224092 into huggingface:main Jun 28, 2023

younesbelkada mentioned this pull request Jun 28, 2023

[gpt2-int8] Add gpt2-xl int8 test #24543

Merged

younesbelkada mentioned this pull request Jul 7, 2023

How to load GPT model in 8bit in a multi-GPU environment with Trainer class huggingface/peft#629

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add bitsandbytes support for gpt2 models #24504

Add bitsandbytes support for gpt2 models #24504

DarioSucic commented Jun 26, 2023

HuggingFaceDocBuilderDev commented Jun 27, 2023 •

edited

Loading

younesbelkada left a comment

younesbelkada Jun 27, 2023

DarioSucic commented Jun 27, 2023

younesbelkada left a comment

sgugger left a comment

Add bitsandbytes support for gpt2 models #24504

Add bitsandbytes support for gpt2 models #24504

Conversation

DarioSucic commented Jun 26, 2023

What does this PR do?

Before submitting

Who can review?

HuggingFaceDocBuilderDev commented Jun 27, 2023 • edited Loading

younesbelkada left a comment

Choose a reason for hiding this comment

younesbelkada Jun 27, 2023

Choose a reason for hiding this comment

DarioSucic commented Jun 27, 2023

younesbelkada left a comment

Choose a reason for hiding this comment

sgugger left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 27, 2023 •

edited

Loading