FIX [`quantization` / `ESM`] Fix ESM 8bit / 4bit with bitsandbytes #29329

younesbelkada · 2024-02-28T01:11:48Z

What does this PR do?

Currently on main, simply running:

import torch
from transformers import AutoModelForMaskedLM, AutoTokenizer

model = AutoModelForMaskedLM.from_pretrained("facebook/esm2_t36_3B_UR50D", load_in_4bit=True)

Fails with an error

  File "/home/younes_huggingface_co/code/transformers/src/transformers/modeling_utils.py", line 802, in _load_state_dict_into_meta_model
    or (not hf_quantizer.check_quantized_param(model, param, param_name, state_dict))
  File "/home/younes_huggingface_co/code/transformers/src/transformers/quantizers/quantizer_bnb_4bit.py", line 124, in check_quantized_param
    if isinstance(module._parameters[tensor_name], bnb.nn.Params4bit):
KeyError: 'inv_freq'

This is because the model pushed in "facebook/esm2_t36_3B_UR50D" do not contain the inv_freq. Maybe during the HfQuantizer refactor we did not properly dealt with that specific scenario, leading to this bug for transformers > 4.37

cc @SunMarc

I ran the quantization tests and they seem to all pass on my end

younesbelkada · 2024-02-28T01:12:09Z

src/transformers/models/esm/modeling_esm.py

@@ -377,7 +377,7 @@ def forward(
        if head_mask is not None:
            attention_probs = attention_probs * head_mask

-        context_layer = torch.matmul(attention_probs, value_layer)
+        context_layer = torch.matmul(attention_probs.to(value_layer.dtype), value_layer)


This was needed to perform correctly inference otherwise you get dtype mismatch

what do we get if we don't do this fix ?

You get a dtype mismatch :/

HuggingFaceDocBuilderDev · 2024-02-28T01:30:48Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

SunMarc

LGTM !

ArthurZucker

Thanks

src/transformers/quantizers/quantizer_bnb_4bit.py

src/transformers/quantizers/quantizer_bnb_8bit.py

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

Titus-von-Koeller · 2024-03-11T09:34:07Z

Thanks for the quick fix, everyone!

…29329) * fix ESM 8bit * Apply suggestions from code review Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com> * fixup --------- Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

fix ESM 8bit

5c7ffa5

younesbelkada commented Feb 28, 2024

View reviewed changes

younesbelkada requested a review from SunMarc February 28, 2024 01:12

younesbelkada mentioned this pull request Feb 28, 2024

transformers > 4.37 breaks bitsandbyte int8 inference of ESM models #29323

Closed

4 tasks

SunMarc approved these changes Feb 28, 2024

View reviewed changes

younesbelkada requested a review from ArthurZucker February 29, 2024 01:54

ArthurZucker approved these changes Feb 29, 2024

View reviewed changes

src/transformers/quantizers/quantizer_bnb_4bit.py Outdated Show resolved Hide resolved

src/transformers/quantizers/quantizer_bnb_8bit.py Outdated Show resolved Hide resolved

younesbelkada and others added 2 commits March 1, 2024 02:36

Apply suggestions from code review

4ddddf5

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>

fixup

44e8430

younesbelkada merged commit 50db7ca into huggingface:main Mar 1, 2024
21 checks passed

younesbelkada deleted the fix-8bit-esm branch March 1, 2024 02:01

DonggeunYu mentioned this pull request Mar 4, 2024

Fixed quantization error in modules that use register_buffer #29203

Closed

5 tasks

SunMarc mentioned this pull request Mar 7, 2024

Quantization Error: register_buffer is not in module_parameters. #29201

Closed

4 tasks

matthewdouglas mentioned this pull request Mar 8, 2024

ESM models quantization broken in current BNB bitsandbytes-foundation/bitsandbytes#1117

Closed

LZHgrla mentioned this pull request Mar 19, 2024

微调chatglm3-6b报错Could not locate the tokenization_chatglm.py inside THUDM/chatglm3-6b. InternLM/xtuner#488

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FIX [`quantization` / `ESM`] Fix ESM 8bit / 4bit with bitsandbytes #29329

FIX [`quantization` / `ESM`] Fix ESM 8bit / 4bit with bitsandbytes #29329

younesbelkada commented Feb 28, 2024 •

edited

Loading

younesbelkada Feb 28, 2024

SunMarc Feb 28, 2024

younesbelkada Feb 29, 2024

HuggingFaceDocBuilderDev commented Feb 28, 2024

SunMarc left a comment

ArthurZucker left a comment

Titus-von-Koeller commented Mar 11, 2024

FIX [quantization / ESM] Fix ESM 8bit / 4bit with bitsandbytes #29329

FIX [quantization / ESM] Fix ESM 8bit / 4bit with bitsandbytes #29329

Conversation

younesbelkada commented Feb 28, 2024 • edited Loading

What does this PR do?

younesbelkada Feb 28, 2024

Choose a reason for hiding this comment

SunMarc Feb 28, 2024

Choose a reason for hiding this comment

younesbelkada Feb 29, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 28, 2024

SunMarc left a comment

Choose a reason for hiding this comment

ArthurZucker left a comment

Choose a reason for hiding this comment

Titus-von-Koeller commented Mar 11, 2024

FIX [`quantization` / `ESM`] Fix ESM 8bit / 4bit with bitsandbytes #29329

FIX [`quantization` / `ESM`] Fix ESM 8bit / 4bit with bitsandbytes #29329

younesbelkada commented Feb 28, 2024 •

edited

Loading