Supervised Fine-tuning (SFT) with PEFT

In this example, we'll see how to use PEFT to perform SFT using PEFT on various distributed setups.

Single GPU SFT with QLoRA

QLoRA uses 4-bit quantization of the base model to drastically reduce the GPU memory consumed by the base model while using LoRA for parameter-efficient fine-tuning. The command to use QLoRA is present at run_peft.sh.

Note:

At present, use_reentrant needs to be True when using gradient checkpointing with QLoRA else QLoRA leads to high GPU memory consumption.

Single GPU SFT with QLoRA using Unsloth

Unsloth enables finetuning Mistral/Llama 2-5x faster with 70% less memory. It achieves this by reducing data upcasting, using Flash Attention 2, custom Triton kernels for RoPE embeddings, RMS Layernorm & Cross Entropy Loss and manual clever autograd computation to reduce the FLOPs during QLoRA finetuning. Below is the list of the optimizations from the Unsloth blogpost mistral-benchmark. The command to use QLoRA with Unsloth is present at run_unsloth_peft.sh.

Optimization in Unsloth to speed up QLoRA finetuning while reducing GPU memory usage

Multi-GPU SFT with QLoRA

To speed up QLoRA finetuning when you have access to multiple GPUs, look at the launch command at run_peft_multigpu.sh. This example to performs DDP on 8 GPUs.

Note:

At present, use_reentrant needs to be False when using gradient checkpointing with Multi-GPU QLoRA else it will lead to errors. However, this leads to huge GPU memory consumption.

Multi-GPU SFT with LoRA and DeepSpeed

When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer the docs at PEFT with DeepSpeed.

Multi-GPU SFT with LoRA and FSDP

When you have access to multiple GPUs, it would be better to use normal LoRA with DeepSpeed/FSDP. To use LoRA with DeepSpeed, refer the docs at PEFT with FSDP.

Tip

Generally try to upgrade to the latest package versions for best results, especially when it comes to bitsandbytes, accelerate, transformers, trl, and peft.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Supervised Fine-tuning (SFT) with PEFT

Single GPU SFT with QLoRA

Single GPU SFT with QLoRA using Unsloth

Multi-GPU SFT with QLoRA

Multi-GPU SFT with LoRA and DeepSpeed

Multi-GPU SFT with LoRA and FSDP

Tip

Files

README.md

Latest commit

History

README.md

File metadata and controls

Supervised Fine-tuning (SFT) with PEFT

Single GPU SFT with QLoRA

Single GPU SFT with QLoRA using Unsloth

Multi-GPU SFT with QLoRA

Multi-GPU SFT with LoRA and DeepSpeed

Multi-GPU SFT with LoRA and FSDP

Tip