Enable finetuning with torchao quantized model #33361

SunMarc · 2024-09-06T16:34:49Z

What does this PR do ?

This PR enable training with torchao quantized model. @BenjaminBossan conducted a few experiements with torchao + peft and it works out of the box for int8 quantizer. More details here.

msaroufim · 2024-09-06T16:38:46Z

src/transformers/quantizers/quantizer_torchao.py

@@ -166,7 +166,8 @@ def is_serializable(self):

    @property
    def is_trainable(self):
-        # torchao does not have official support for QAT (Quantization Aware Training)


so we support both QAT and have experimental support for quantized training as well

We should probably create another propriety like is_qat_trainable and change this to is_peft_trainable to distinguish these two types of training.
Do you have a script for QAT training ?

Sorry for delay but here you go https://github.com/pytorch/ao/tree/main/torchao/quantization/prototype/qat also cc @andrewor14

BenjaminBossan · 2024-09-06T16:46:46Z

Should the method also check if PEFT is installed? I guess it's not strictly necessary, so this is also fine, just wondering. Also, it would be nice to get confirmation if others also run into problems with int4 or if it's just me.

SunMarc · 2024-09-06T16:53:11Z

Should the method also check if PEFT is installed? I guess it's not strictly necessary, so this is also fine, just wondering. Also, it would be nice to get confirmation if others also run into problems with int4 or if it's just me.

If one uses Trainer to train the quantized model, it will trigger an error in the trainer class if the user doesn't use peft. So I guess we don't need to add a warning there.

HuggingFaceDocBuilderDev · 2024-09-06T16:55:02Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ArthurZucker

Feel free to merge @SunMarc

andrewor14 · 2024-09-13T15:04:30Z

src/transformers/quantizers/quantizer_torchao.py

+        supported_quant_types_for_training = [
+            "int8_weight_only",
+            "int8_dynamic_activation_int8_weight",
+        ]


Sorry actually I think these configs do not support training currently, since the underlying tensor subclass does not pass gradients correctly. There is actually an ongoing effort from our side to support this. Can you confirm this @jerryzh168?

WDYM by not passing the gradient correctly ? cc @BenjaminBossan
Just to clarify, here we are performing peft fine-tuning, meaning that we are only training adapters (added linear layers) and freezing the other modules (quantized linear layers). Is the issue during the gradient calculation or on the update stage (needed for QAT but not for PEFT)

Oh I see, thanks for the clarification. So we do not actually train these quantized linear layers, but we still need gradients to flow through them, is that correct? I think a potential issue from the torchao side is that the tensor subclass AffineQuantizedTensor currently explicitly does not require gradients. However, if we're just freezing these layers then it might be fine. Were you able to verify that the end-to-end PEFT accuracies are as expected?

The fact that requires_grad is False should not be an issue, it's the same for other quantization methods. But I can investigate performance compared to, say, bnb. I'll check next week.

I ran a small experiment using LoRA for text classification with google/gemma-2-2b as the base model. Memory was simply measured by observing nvidia-smi.

Results for 8bit bnb:

epoch 1 | train loss 1.0293 | {'accuracy': 0.6397058823529411, 'f1': 0.7282809611829945} epoch 2 | train loss 0.5860 | {'accuracy': 0.7622549019607843, 'f1': 0.8369747899159664} epoch 3 | train loss 0.4509 | {'accuracy': 0.7941176470588235, 'f1': 0.8546712802768166} epoch 4 | train loss 0.3845 | {'accuracy': 0.8112745098039216, 'f1': 0.8683760683760684} epoch 5 | train loss 0.3431 | {'accuracy': 0.8186274509803921, 'f1': 0.8745762711864407}

Wall time: 4min 29s
memory: 21520MiB

Results for int8_weight_only torchao (notebook):

epoch 1 | train loss 1.0672 | {'accuracy': 0.6715686274509803, 'f1': 0.7751677852348994} epoch 2 | train loss 0.6261 | {'accuracy': 0.7377450980392157, 'f1': 0.8201680672268907} epoch 3 | train loss 0.4743 | {'accuracy': 0.7867647058823529, 'f1': 0.8502581755593803} epoch 4 | train loss 0.4006 | {'accuracy': 0.803921568627451, 'f1': 0.8586572438162544} epoch 5 | train loss 0.3585 | {'accuracy': 0.8235294117647058, 'f1': 0.8791946308724832}

Wall time: 2min 46s
memory: 18098MiB

Results for int8_dynamic_activation_int8_weight torchao (notebook):

epoch 1 | train loss 1.7618 | {'accuracy': 0.46568627450980393, 'f1': 0.5458333333333333} epoch 2 | train loss 1.1905 | {'accuracy': 0.5245098039215687, 'f1': 0.6325757575757576} epoch 3 | train loss 1.1478 | {'accuracy': 0.5318627450980392, 'f1': 0.6456400742115028} epoch 4 | train loss 1.1384 | {'accuracy': 0.5367647058823529, 'f1': 0.6506469500924215} epoch 5 | train loss 1.1365 | {'accuracy': 0.5367647058823529, 'f1': 0.6506469500924215}

Wall time: 4min 2s
memory: 4122MiB

So int8_weight_only compares quite favorably to bnb 8bit, as the scores are very close but torchao is faster and requires a little bit less memory.

int8_dynamic_activation_int8_weight is absolutely great when it comes to memory (~3.2 GB for the model itself, i.e. only 1 GB for hidden states etc.) while still being reasonably fast. However, the scores are considerably worse. Not sure if that's expected or if I should use different settings/params or what.

I tried increasing the learning rate for int8_dynamic_activation_int8_weight. With 10x the learning rate, I could get a final score of:

epoch 5 | train loss 0.6309 | {'accuracy': 0.6985294117647058, 'f1': 0.7890222984562607}

Still not as good as the other runs but a significant improvement. Is it expected that int8_dynamic_activation_int8_weight requires different hyper-parameters compared to int8_weight_only?

enable training

enable training

ca16853

SunMarc requested review from BenjaminBossan and ArthurZucker September 6, 2024 16:35

msaroufim reviewed Sep 6, 2024

View reviewed changes

SunMarc changed the title ~~Eenable finetuning with torchao quantized model~~ Enable finetuning with torchao quantized model Sep 6, 2024

Merge remote-tracking branch 'upstream/main' into peft-torchao

4f45f26

ArthurZucker approved these changes Sep 13, 2024

View reviewed changes

SunMarc merged commit 0963229 into main Sep 13, 2024
22 checks passed

SunMarc deleted the peft-torchao branch September 13, 2024 13:07

BenjaminBossan mentioned this pull request Sep 13, 2024

FEAT: Support torchao huggingface/peft#2062

Merged

andrewor14 reviewed Sep 13, 2024

View reviewed changes

itazap pushed a commit to NielsRogge/transformers that referenced this pull request Sep 20, 2024

Enable finetuning with torchao quantized model (huggingface#33361)

1326f12

enable training

amyeroberts pushed a commit to amyeroberts/transformers that referenced this pull request Oct 2, 2024

Enable finetuning with torchao quantized model (huggingface#33361)

8b3c43e

enable training

BernardZach pushed a commit to BernardZach/transformers that referenced this pull request Dec 5, 2024

Enable finetuning with torchao quantized model (huggingface#33361)

146c300

enable training

BernardZach pushed a commit to innovationcore/transformers that referenced this pull request Dec 6, 2024

Enable finetuning with torchao quantized model (huggingface#33361)

e50a0bf

enable training

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable finetuning with torchao quantized model #33361

Enable finetuning with torchao quantized model #33361

SunMarc commented Sep 6, 2024 •

edited

Loading

msaroufim Sep 6, 2024

SunMarc Sep 6, 2024 •

edited

Loading

msaroufim Sep 8, 2024

BenjaminBossan commented Sep 6, 2024

SunMarc commented Sep 6, 2024

HuggingFaceDocBuilderDev commented Sep 6, 2024

ArthurZucker left a comment

andrewor14 Sep 13, 2024

SunMarc Sep 13, 2024

andrewor14 Sep 13, 2024

BenjaminBossan Sep 13, 2024

BenjaminBossan Sep 17, 2024

BenjaminBossan Sep 17, 2024

Enable finetuning with torchao quantized model #33361

Enable finetuning with torchao quantized model #33361

Conversation

SunMarc commented Sep 6, 2024 • edited Loading

What does this PR do ?

Choose a reason for hiding this comment

SunMarc Sep 6, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan commented Sep 6, 2024

SunMarc commented Sep 6, 2024

HuggingFaceDocBuilderDev commented Sep 6, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SunMarc commented Sep 6, 2024 •

edited

Loading

SunMarc Sep 6, 2024 •

edited

Loading