Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4-bit QLoRA via bitsandbytes (4-bit base model + LoRA) #476

Merged
merged 15 commits into from
May 20, 2023

Conversation

TimDettmers
Copy link
Contributor

This adds QLoRA support to PEFT. More information about QLoRA from our abstract:

We develop QLoRA tuning, a method that finetunes by backpropagating gradients through a frozen 4-bit base model into low rank adapters (LoRA). With QLoRA tuning we can finetune 30B/65B parameter models on 24/48GB GPUs while preserving regular 16-bit full finetuning runtime and task performance. We achieve the memory efficiency and quantization precision through a combination of new methods: nested quantization to reduce the average memory footprint from 4.5 to 4.1 bits per parameter, paged optimizers to manage gradient checkpointing memory spikes, and a new data type, 4-bit NormalFloat (NF4), which is information theoretically and empirically optimal for normally distributed weights. To demonstrate the effectiveness and ease of use of QLoRA tuning we finetune more than 1,000 models to create a detailed dissection of instruction following performance across datasets (FLAN, Alpaca, Chip2, SuperNatural Instructions, Chip2, AnthropicHH), models types (LLaMA, T5), and model scales (125M to 65B). A discussion of the results is forthcoming in our paper.

@sgugger @younesbelkada

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your inspiring work, as always! I just left 3 comments - also curious to see what @sgugger will say!

setup.py Outdated
@@ -41,7 +41,6 @@
"packaging>=20.0",
"psutil",
"pyyaml",
"torch>=1.13.0",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this change is not needed

@@ -18,7 +18,6 @@
import numpy as np
import torch
import transformers
import wandb
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably this change is not needed either

Comment on lines +737 to +740
if hasattr(self.base_model, "model"):
self.base_model.model.generation_config = self.generation_config
else:
self.base_model.generation_config = self.generation_config
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these changes are fine for now, I will investigate later why this change is needed

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented May 19, 2023

The documentation is not available anymore as the PR was closed or merged.

Copy link
Contributor

@sgugger sgugger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding the support to 4bit quantization! Apart from Younes comments on potentially unrelated changes, this looks great to me!

@TimDettmers
Copy link
Contributor Author

Thank you, Younes & Sylvain! Thanks for the changes, Younes. This looks all good to me.

Copy link
Contributor

@younesbelkada younesbelkada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again thank you for your great work, Tim!

@younesbelkada younesbelkada merged commit d6015bc into huggingface:main May 20, 2023
@ewof
Copy link

ewof commented May 20, 2023

holy shit it's happening

@ewof
Copy link

ewof commented May 20, 2023

dettmers is a hero

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants