Skip to content

Conversation

@behroozazarkhalili
Copy link
Collaborator

Resolves #4376

This PR completely rewrites the PEFT integration documentation to address the concerns raised in #4376.

Changes

  • Comprehensive Trainer Coverage: Added detailed examples for SFT, DPO, and GRPO trainers with PEFT
  • QLoRA Section: Added complete QLoRA guide with 4-bit and 8-bit quantization examples
  • Prompt Tuning: Added new section on prompt tuning with configuration examples
  • Updated Examples: All code examples updated to match current TRL API (LoRA r=32, alpha=16)
  • Removed Outdated Content: Removed PPO-only focus and outdated information
  • Enhanced Documentation: Added troubleshooting section, multi-GPU training guidance, and command-line arguments reference

Documentation Structure

  1. Installation
  2. Quick Start
  3. PEFT with Different Trainers (SFT, DPO, GRPO)
  4. QLoRA: Quantized Low-Rank Adaptation
  5. Prompt Tuning
  6. Advanced PEFT Configurations
  7. Saving and Loading PEFT Models
  8. Multi-GPU Training
  9. Troubleshooting
  10. Resources

All examples have been verified against the current TRL codebase and official scripts.

Resolves huggingface#4376

- Add detailed examples for SFT, DPO, and GRPO trainers with PEFT
- Add QLoRA section with 4-bit and 8-bit quantization examples
- Add Prompt Tuning section with configuration examples
- Update all examples to match current TRL API (LoRA r=32, alpha=16)
- Remove outdated PPO-only focus
- Add troubleshooting section and multi-GPU training guidance
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update!! Super detailed 😄


The notebooks and scripts in these examples show how to use Low Rank Adaptation (LoRA) to fine-tune models in a memory efficient manner. Most of PEFT methods supported in peft library but note that some PEFT methods such as Prompt tuning are not supported.
For more information on LoRA, see the [original paper](https://huggingface.co/papers/2106.09685).
TRL supports [PEFT](https://github.com/huggingface/peft) (Parameter-Efficient Fine-Tuning) methods for memory-efficient model training. PEFT enables fine-tuning large language models by training only a small number of additional parameters while keeping the base model frozen, significantly reducing computational costs and memory requirements.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And if you want to load your model in 8bit precision:
## PEFT with Different Trainers

TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could leverage the usage of

<hfoptions id="command_line">
<hfoption id="SFT">
...
</hfoption>
<hfoption id="DPO">
...
</hfoption>
</hfoptions>

in this section to reduce the number of sections and improve readability.

config.model_name,
load_in_8bit=True,
peft_config=lora_config,
from datasets import load_dataset
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could focus only on the ideas needed for PEFT and simplify the rest to reduce the snippets.

For example, we could do:

training_args = SFTConfig(
   ...
)

similar for any part that is not strictly needed for the configuration




## Resources
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could include here TRL notebooks, TRL examples, and recipes from cookbook (https://huggingface.co/learn/cookbook/index) that leverage PEFT

dataset = load_dataset("trl-lib/Capybara", split="train")

# Configure LoRA
peft_config = LoraConfig(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually have 3 different ways of adding the peft config to the trainer:

  1. We give the model_name to the Trainer and the peft_config
  2. We give the model instance and at the peft_config
  3. We give the peft_model to the trainer directly, preparing it outside, without passing peft_config to the trainer.

We could add these details somewhere.

# Create trainer with PEFT
trainer = SFTTrainer(
model=model,
args=training_args,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
args=training_args,

```python
from peft import LoraConfig
from trl import AutoModelForCausalLMWithValueHead
from trl import SFTConfig, SFTTrainer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from trl import SFTConfig, SFTTrainer
from trl import SFTTrainer


TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer.

### Supervised Fine-Tuning (SFT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of subsections, I'd write it with

<hfoptions id="trainer">
<hfoption id="SFT">

```
# Code for SFT
```

</hfoption>
<hfoption id="DPO">

```
Code for DPO
```

</hfoption>
</hfoptions>

# Training arguments
training_args = SFTConfig(
output_dir="./Qwen2-0.5B-SFT-LoRA",
learning_rate=2.0e-4,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion, it is very important that all examples on this page contain an explicit learning rate (corresponding to 10x the trainer's default learning rate). Even better would be a small section explaining why, with a link to https://thinkingmachines.ai/blog/lora/.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +145 to 155
#### Full Training (No PEFT)

```bash
python trl/scripts/dpo.py \
--model_name_or_path Qwen/Qwen2-0.5B-Instruct \
--dataset_name trl-lib/ultrafeedback_binarized \
--learning_rate 5.0e-7 \
--per_device_train_batch_size 2 \
--gradient_accumulation_steps 8 \
--output_dir Qwen2-0.5B-DPO
```
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these "No PEFT" sections are necessary

For slightly higher precision with reduced memory savings, you can use 8-bit quantization:

```python
from transformers import BitsAndBytesConfig
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from transformers import BitsAndBytesConfig
from transformers import BitsAndBytesConfig, AutoModelForCausalLM

Comment on lines +642 to +665
## Troubleshooting

### Out of Memory Errors

If you encounter OOM errors:

1. Enable QLoRA: `--load_in_4bit`
2. Reduce batch size: `--per_device_train_batch_size 1`
3. Increase gradient accumulation: `--gradient_accumulation_steps 16`
4. Enable gradient checkpointing: `--gradient_checkpointing`
5. Reduce LoRA rank: `--lora_r 8`
6. Reduce target modules: `--lora_target_modules q_proj v_proj`

### Slow Training

If training is slow:

1. Increase batch size (if memory allows)
2. Use Flash Attention 2: `--attn_implementation flash_attention_2`
3. Use bf16: `--bf16`
4. Reduce gradient checkpointing frequency



Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most a these are not specific to peft, so I recommend removing this section, and add these elements in reducing_memory_usage.md or speeding_up_training.md (can be done in a follow-up PR)

@behroozazarkhalili
Copy link
Collaborator Author

Addressed Reviewer Feedback

Thank you for the detailed review! I've addressed all the comments:

✅ Completed Changes

  1. Learning Rates - CRITICAL:

    • Added comprehensive "Learning Rate Considerations" section with table showing recommended rates for each trainer
    • Added explicit learning_rate parameters to all 6 Python code examples (SFT, DPO, GRPO, QLoRA, Prompt Tuning)
    • Included LoRA blog post and LoRA Without Regret documentation references
  2. Three PEFT Configuration Methods:

    • Added new section documenting all three approaches:
      • CLI flags (--use_peft)
      • Passing peft_config to trainer (recommended)
      • Applying PEFT to model directly with get_peft_model (advanced)
    • Each method includes pros/cons and code examples
  3. Resources Enhancement:

    • Reorganized Resources section with categories
    • Added TRL notebooks (SFT with LoRA/QLoRA)
    • Added TRL examples directory link
    • Added TRL Cookbook recipes link
    • Added LoRA Without Regret documentation reference
  4. Code Simplification:

    • All Python examples now use ellipsis (...) for non-PEFT configuration
    • Focus maintained on PEFT-specific parameters
  5. Removed Sections:

    • Removed 3 "Full Training (No PEFT)" subsections from SFT, DPO, and GRPO
    • Removed entire Troubleshooting section
  6. Import Order:

    • Fixed import order in QLoRA example (standard library before third-party)
    • Properly grouped and ordered all imports

Already Addressed

  • Notebook link, hfoptions tabs, SFTTrainer import (already present in initial PR)

All changes committed in cbe38d7.

@behroozazarkhalili behroozazarkhalili enabled auto-merge (squash) November 5, 2025 15:15
behroozazarkhalili added a commit that referenced this pull request Nov 5, 2025
This PR addresses Issue #4376 by completely rewriting the PEFT
integration documentation with:

- Comprehensive Learning Rate section with table and best practices
- Documentation of three PEFT configuration methods
- Enhanced Resources section with notebooks, examples, and Cookbook
- Updated code examples for SFT, DPO, GRPO, QLoRA, and Prompt Tuning
- Removed outdated sections per reviewer feedback
- Fixed import ordering and code simplification

All reviewer feedback from PR #4421 has been addressed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rewrite peft_integration.md

4 participants