RuntimeError: only Tensors of floating point dtype can require gradients for QLoRA since transformers 4.40 #1720

dipanjanS · 2024-05-09T14:51:11Z

System Info

transformers-4.40.1
peft-0.10.0
accelerate-0.30.0 
bitsandbytes-0.43.1 
evaluate-0.4.2

Who can help?

@pacman100

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder
My own task or dataset (give details below)

Reproduction

This is the colab notebook for a simple fine-tuning of a DistilBERT model using QLoRA

The main code snippet of interest which is erroring out:

model_checkpoint = "distilbert/distilbert-base-uncased"

id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

import torch
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, BitsAndBytesConfig

config = BitsAndBytesConfig(
    load_in_4bit=True, # quantize the model to 4-bits when you load it
    bnb_4bit_quant_type="nf4", # use a special 4-bit data type for weights initialized from a normal distribution
    bnb_4bit_use_double_quant=True, # nested quantization scheme to quantize the already quantized weights
    bnb_4bit_compute_dtype=torch.bfloat16, # use bfloat16 for faster computation,
)

model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint,
                                                           id2label=id2label,
                                                           label2id=label2id,
                                                           num_labels=2,
                                                           quantization_config=config)

from peft import prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)

from peft import LoraConfig, get_peft_model, TaskType, replace_lora_weights_loftq

config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_CLS)

peft_model = get_peft_model(model, config)
replace_lora_weights_loftq(peft_model)
print_trainable_parameters(peft_model)

Error happens in the line peft_model = get_peft_model(model, config) above when the PEFT model is being made. Error trace is as follows.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
[<ipython-input-41-79fb3abcb23e>](https://localhost:8080/#) in <cell line: 19>()
     17     model.get_input_embeddings().register_forward_hook(make_inputs_require_grad)
     18 
---> 19 peft_model = get_peft_model(model, config)
     20 replace_lora_weights_loftq(peft_model)
     21 print_trainable_parameters(peft_model)

5 frames
[/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py](https://localhost:8080/#) in requires_grad_(self, requires_grad)
   2433         """
   2434         for p in self.parameters():
-> 2435             p.requires_grad_(requires_grad)
   2436         return self
   2437 

RuntimeError: only Tensors of floating point dtype can require gradients

Expected behavior

Ideally the model should get created and then fine-tuned. The same notebook used to work fine with transformers==4.38 but something might have changed as it is no longer working with transformers==4.40, have validated the same that when I downgrade this code still works. I want some help in figuring out if something is wrong in the code which I need to change \ fundamentally doing wrong or there is a deeper issue.

The text was updated successfully, but these errors were encountered:

BenjaminBossan · 2024-05-10T09:39:42Z

Yes, I can reproduce the error. The reason is that since transformers==4.40, the pre_classifier module of this model is converted to a bitsandbytes Linear4bit when instead of being a normal PyTorch nn.Linear. As this module as being added to the modules_to_save, PEFT tries to enable gradients on it, resulting in the error you see. We'll discuss this internally and think of an appropriate fix. In the meantime, if possible, downgrade to an earlier transformers version.

dipanjanS · 2024-05-10T10:07:05Z

Thanks for checking on this, appreciate it. Sure for now I'll use the earlier version for my projects and demos.!

…

On Fri, May 10, 2024, 3:10 PM Benjamin Bossan ***@***.***> wrote: Yes, I can reproduce the error. The reason is that since transformers==4.40, the pre_classifier module of this model is converted to a bitsandbytes Linear4bit when instead of being a normal PyTorch nn.Linear. As this module as being added to the modules_to_save, PEFT tries to enable gradients on it, resulting in the error you see. We'll discuss this internally and think of an appropriate fix. In the meantime, if possible, downgrade to an earlier transformers version. — Reply to this email directly, view it on GitHub <#1720 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2J3R72Z5SFTDPMDVSE37DZBSIXHAVCNFSM6AAAAABHO6UKESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBUGI4TAMRTGE> . You are receiving this because you authored the thread.Message ID: ***@***.***>

younesbelkada · 2024-05-13T14:58:49Z

Hi @dipanjanS !
Thanks for the issue, I had a deeper look. Previously there was a silent bug in transformers that was quantizing the pre_classifier layer, which shouldn't happen as only the last layer should be quantized.

huggingface/transformers#29958 introduced a fix for that and introduced this bug you are sharing which shouldn't be really a bug since the pre_classifier should be quantized at first place, only the last layer shouldn't be quantized.

To temporary fix your issue, can you load the 4-bit model with: llm_int8_skip_modules=["classifier", "pre_classifier"] ?

model_checkpoint = "distilbert/distilbert-base-uncased"

id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

import torch
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, BitsAndBytesConfig

config = BitsAndBytesConfig(
    load_in_4bit=True, # quantize the model to 4-bits when you load it
    bnb_4bit_quant_type="nf4", # use a special 4-bit data type for weights initialized from a normal distribution
    bnb_4bit_use_double_quant=True, # nested quantization scheme to quantize the already quantized weights
    bnb_4bit_compute_dtype=torch.bfloat16, # use bfloat16 for faster computation,
+     llm_int8_skip_modules=["classifier", "pre_classifier"]
)

model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint,
                                                           id2label=id2label,
                                                           label2id=label2id,
                                                           num_labels=2,
                                                           quantization_config=config)

from peft import prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)

from peft import LoraConfig, get_peft_model, TaskType, replace_lora_weights_loftq

config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_CLS)

peft_model = get_peft_model(model, config)
replace_lora_weights_loftq(peft_model)
print_trainable_parameters(peft_model)

dipanjanS · 2024-05-15T11:02:26Z

Awesome, can confirm this definitely works! Just wanted to check going forward should I explicitly mention those classifier layers to be skipped in the BnB configuration? or would this be automatically handled in a future release. Based on your recommendation I will try to use that going forward

…

On Mon, May 13, 2024 at 8:29 PM Younes Belkada ***@***.***> wrote: Hi @dipanjanS <https://github.com/dipanjanS> ! Thanks for the issue, I had a deeper look. Previously there was a silent bug in transformers that was quantizing the pre_classifier layer, which shouldn't happen as only the last layer should be quantized. huggingface/transformers#29958 <huggingface/transformers#29958> introduced a fix for that and introduced this bug you are sharing which shouldn't be really a bug since the pre_classifier should be quantized at first place, only the last layer shouldn't be quantized. To temporary fix your issue, can you load the 4-bit model with: llm_int8_skip_modules=["classifier", "pre_classifier"] ? model_checkpoint = "distilbert/distilbert-base-uncased" id2label = {0: "NEGATIVE", 1: "POSITIVE"} label2id = {"NEGATIVE": 0, "POSITIVE": 1} import torch from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, BitsAndBytesConfig config = BitsAndBytesConfig( load_in_4bit=True, # quantize the model to 4-bits when you load it bnb_4bit_quant_type="nf4", # use a special 4-bit data type for weights initialized from a normal distribution bnb_4bit_use_double_quant=True, # nested quantization scheme to quantize the already quantized weights bnb_4bit_compute_dtype=torch.bfloat16, # use bfloat16 for faster computation,+ llm_int8_skip_modules=["classifier", "pre_classifier"] ) model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint, id2label=id2label, label2id=label2id, num_labels=2, quantization_config=config) from peft import prepare_model_for_kbit_training model = prepare_model_for_kbit_training(model) from peft import LoraConfig, get_peft_model, TaskType, replace_lora_weights_loftq config = LoraConfig( r=8, lora_alpha=32, target_modules=["q_lin", "k_lin", "v_lin", "out_lin"], lora_dropout=0.05, bias="none", task_type=TaskType.SEQ_CLS) peft_model = get_peft_model(model, config) replace_lora_weights_loftq(peft_model) print_trainable_parameters(peft_model) — Reply to this email directly, view it on GitHub <#1720 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AA2J3R77JKB4MFDC2TFJ4X3ZCDIL5AVCNFSM6AAAAABHO6UKESVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBXHA3TQNZZGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

younesbelkada · 2024-05-15T11:21:07Z

Hi @dipanjanS
Thanks ! Yes, since there was a bug on previous transformers version, we will not automatically handle this in a future transformers release

github-actions · 2024-06-08T15:03:39Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

NotTheStallion · 2024-07-12T14:43:45Z

Hi @dipanjanS ! Thanks for the issue, I had a deeper look. Previously there was a silent bug in transformers that was quantizing the pre_classifier layer, which shouldn't happen as only the last layer should be quantized.

huggingface/transformers#29958 introduced a fix for that and introduced this bug you are sharing which shouldn't be really a bug since the pre_classifier should be quantized at first place, only the last layer shouldn't be quantized.

To temporary fix your issue, can you load the 4-bit model with: llm_int8_skip_modules=["classifier", "pre_classifier"] ?

model_checkpoint = "distilbert/distilbert-base-uncased"

id2label = {0: "NEGATIVE", 1: "POSITIVE"}
label2id = {"NEGATIVE": 0, "POSITIVE": 1}

import torch
from transformers import AutoModelForSequenceClassification, TrainingArguments, Trainer, BitsAndBytesConfig

config = BitsAndBytesConfig(
    load_in_4bit=True, # quantize the model to 4-bits when you load it
    bnb_4bit_quant_type="nf4", # use a special 4-bit data type for weights initialized from a normal distribution
    bnb_4bit_use_double_quant=True, # nested quantization scheme to quantize the already quantized weights
    bnb_4bit_compute_dtype=torch.bfloat16, # use bfloat16 for faster computation,
+     llm_int8_skip_modules=["classifier", "pre_classifier"]
)

model = AutoModelForSequenceClassification.from_pretrained(model_checkpoint,
                                                           id2label=id2label,
                                                           label2id=label2id,
                                                           num_labels=2,
                                                           quantization_config=config)

from peft import prepare_model_for_kbit_training

model = prepare_model_for_kbit_training(model)

from peft import LoraConfig, get_peft_model, TaskType, replace_lora_weights_loftq

config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_lin", "k_lin", "v_lin", "out_lin"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_CLS)

peft_model = get_peft_model(model, config)
replace_lora_weights_loftq(peft_model)
print_trainable_parameters(peft_model)

Thank you for the detailed answer.

I would like to ask, how could we know what layers are supposed to be quantized and what aren't to fix this issue in other models.

For my case i encoutered this error while trying to load Llama3-8B using the following config :
BitsAndBytesConfig { "_load_in_4bit": true, "_load_in_8bit": false, "bnb_4bit_compute_dtype": "bfloat16", "bnb_4bit_quant_storage": "bfloat16", "bnb_4bit_quant_type": "nf4", "bnb_4bit_use_double_quant": true, "llm_int8_enable_fp32_cpu_offload": false, "llm_int8_has_fp16_weight": false, "llm_int8_skip_modules": null, "llm_int8_threshold": 6.0, "load_in_4bit": true, "load_in_8bit": false, "quant_method": "bitsandbytes" }

younesbelkada mentioned this issue May 13, 2024

FIX / Quantizaiton: Revert back previous replacement logic huggingface/transformers#30779

Closed

github-actions bot closed this as completed Jun 17, 2024

NotTheStallion mentioned this issue Jul 12, 2024

Gradient not appliable to 4-bit quantization. (sft-qlora-fsdp) #1923

Closed

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: only Tensors of floating point dtype can require gradients for QLoRA since transformers 4.40 #1720

RuntimeError: only Tensors of floating point dtype can require gradients for QLoRA since transformers 4.40 #1720

dipanjanS commented May 9, 2024 •

edited

Loading

BenjaminBossan commented May 10, 2024

dipanjanS commented May 10, 2024 via email

younesbelkada commented May 13, 2024

dipanjanS commented May 15, 2024 via email

younesbelkada commented May 15, 2024

github-actions bot commented Jun 8, 2024

NotTheStallion commented Jul 12, 2024

RuntimeError: only Tensors of floating point dtype can require gradients for QLoRA since transformers 4.40 #1720

RuntimeError: only Tensors of floating point dtype can require gradients for QLoRA since transformers 4.40 #1720

Comments

dipanjanS commented May 9, 2024 • edited Loading

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

BenjaminBossan commented May 10, 2024

dipanjanS commented May 10, 2024 via email

younesbelkada commented May 13, 2024

dipanjanS commented May 15, 2024 via email

younesbelkada commented May 15, 2024

github-actions bot commented Jun 8, 2024

NotTheStallion commented Jul 12, 2024

dipanjanS commented May 9, 2024 •

edited

Loading