Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for QLoRA/ QAdapter training via bitsandbytes #663

Merged
merged 16 commits into from
Apr 23, 2024

Conversation

calpt
Copy link
Member

@calpt calpt commented Mar 31, 2024

This PR adds support for wrapping bitsandbytes' Linear4bit and Linear8bitLt quantization layers with our LoRA implementation, enabling training LoRA adapters on quantized models in QLoRA style.

Implementation is loosely similar to HF peft's approach, which can be found here: https://github.com/huggingface/peft/blob/v0.10.0/src/peft/tuners/lora/bnb.py.

Demo

I've added a new notebook here: https://github.com/calpt/adapter-transformers/blob/dev/qlora/notebooks/QLoRA_Llama_Finetuning.ipynb.
The notebook showcases this feature by finetuning a 4bit-quantized Llama 2 7B on an instruction tuning dataset (similar to Guanaco in the QLoRA paper).
Tested that it runs without errors in the provided notebook, other setups are not extensively tested yet.

Pre-trained checkpoints

Adapters trained with the notebook code can be found here:

Llama-2 7B: https://huggingface.co/AdapterHub/llama2-7b-qlora-openassistant
Llama-2 13B: https://huggingface.co/AdapterHub/llama2-13b-qlora-openassistant

Current limitations

@calpt calpt linked an issue Mar 31, 2024 that may be closed by this pull request
4 tasks
@calpt calpt linked an issue Apr 1, 2024 that may be closed by this pull request
@calpt calpt marked this pull request as ready for review April 14, 2024 20:55
@calpt calpt changed the title WIP: Add support for QLoRA training via bitsandbytes Add support for QLoRA training via bitsandbytes Apr 14, 2024
Copy link
Member

@hSterz hSterz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just one small question about something that is unclear to me

# result shape: <batch_size> x <seq_len> x <head_dim>
layer_output = F.linear(input_states, weight, bias=self.bias)
else:
layer_output = super().forward(input_states)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which forward method is called here since this does not inherit from nn.Linear anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the subclasses of this (LoRALinearTorch, LoRALinear4bit, LoRALinear8bitLt), inherit from different types of linear layers

calpt added a commit that referenced this pull request Apr 22, 2024
…678)

Adapters currently does not work correctly with passing
`device_map="auto"` in a model's `from_pretrained()`. Device
auto-mapping is handled by HF accelerate, which wraps the original
module forward method.

This PR fixes compatibility of Adapters' post-hoc model wrapping with
Accelerate's device auto-mapping via wrapping the forward pass.

Fixing this is required for enabling quantized training of adapters
(bottleneck & prefix-tuning) in #663.
@calpt calpt changed the title Add support for QLoRA training via bitsandbytes Add support for QLoRA/ QAdapter training via bitsandbytes Apr 23, 2024
@calpt calpt merged commit 42c1753 into adapter-hub:main Apr 23, 2024
3 checks passed
@calpt calpt deleted the dev/qlora branch April 23, 2024 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants