You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Address PR feedback for PEFT integration guide
- Add comprehensive learning rate section with table and blog links
- Add learning_rate parameters to all code examples (SFT, DPO, GRPO, QLoRA, Prompt Tuning)
- Remove Full Training (No PEFT) sections for cleaner focus
- Remove Troubleshooting section as requested
- Document three methods of PEFT configuration (CLI, peft_config, get_peft_model)
- Enhance Resources section with TRL notebooks, examples, and Cookbook
- Simplify Python examples using ellipsis for non-PEFT configs
- Fix import order (standard library before third-party)
# Configure training - note the higher learning rate for LoRA (10x base rate)
53
+
training_args = SFTConfig(
54
+
learning_rate=2.0e-4, # 10x the base rate (2.0e-5) for LoRA
55
+
...
56
+
)
57
+
52
58
# Create trainer with PEFT
53
59
trainer = SFTTrainer(
54
60
model=model,
@@ -58,6 +64,107 @@ trainer = SFTTrainer(
58
64
)
59
65
```
60
66
67
+
## Three Ways to Configure PEFT
68
+
69
+
TRL provides three different methods to configure PEFT, each suited for different use cases:
70
+
71
+
### 1. Using CLI Flags (Simplest)
72
+
73
+
The easiest way to enable PEFT is using the `--use_peft` flag with the command-line interface. This method is ideal for quick experiments and standard configurations:
74
+
75
+
```bash
76
+
python trl/scripts/sft.py \
77
+
--model_name_or_path Qwen/Qwen2-0.5B \
78
+
--dataset_name trl-lib/Capybara \
79
+
--use_peft \
80
+
--lora_r 32 \
81
+
--lora_alpha 16 \
82
+
--lora_dropout 0.05 \
83
+
--output_dir Qwen2-0.5B-SFT-LoRA
84
+
```
85
+
86
+
**Pros**: Quick setup, no code required
87
+
**Cons**: Limited to LoRA, fewer customization options
88
+
89
+
### 2. Passing peft_config to Trainer (Recommended)
90
+
91
+
For more control, pass a PEFT configuration directly to the trainer. This is the recommended approach for most use cases:
**Pros**: Full control, supports all PEFT methods (LoRA, Prompt Tuning, etc.)
115
+
**Cons**: Requires Python code
116
+
117
+
### 3. Applying PEFT to Model Directly (Advanced)
118
+
119
+
For maximum flexibility, you can apply PEFT to your model before passing it to the trainer:
120
+
121
+
```python
122
+
from peft import LoraConfig, get_peft_model
123
+
from transformers import AutoModelForCausalLM
124
+
from trl import SFTConfig, SFTTrainer
125
+
126
+
# Load base model
127
+
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B")
128
+
129
+
# Apply PEFT configuration
130
+
peft_config = LoraConfig(
131
+
r=32,
132
+
lora_alpha=16,
133
+
lora_dropout=0.05,
134
+
bias="none",
135
+
task_type="CAUSAL_LM",
136
+
)
137
+
model = get_peft_model(model, peft_config)
138
+
139
+
# Pass PEFT-wrapped model to trainer
140
+
trainer = SFTTrainer(
141
+
model=model, # Already has PEFT applied
142
+
args=training_args,
143
+
train_dataset=dataset,
144
+
# Note: no peft_config needed here
145
+
)
146
+
```
147
+
148
+
**Pros**: Maximum control, useful for custom model architectures or complex setups
149
+
**Cons**: More verbose, requires understanding of PEFT internals
150
+
151
+
## Learning Rate Considerations
152
+
153
+
When using LoRA or other PEFT methods, you typically need to use a **higher learning rate** (approximately 10x) compared to full fine-tuning. This is because PEFT methods train only a small fraction of parameters, requiring a larger learning rate to achieve similar parameter updates.
154
+
155
+
**Recommended learning rates:**
156
+
157
+
| Trainer | Full Fine-Tuning | With LoRA (10x) |
158
+
|---------|------------------|-----------------|
159
+
|**SFT**|`2.0e-5`|`2.0e-4`|
160
+
|**DPO**|`5.0e-7`|`5.0e-6`|
161
+
|**GRPO**|`1.0e-6`|`1.0e-5`|
162
+
|**Prompt Tuning**| N/A |`1.0e-2` to `3.0e-2`|
163
+
164
+
> **Why 10x?** LoRA adapters have significantly fewer trainable parameters than the full model. A higher learning rate compensates for this reduced parameter count, ensuring effective training. For detailed explanation, see [this blog post](https://thinkingmachines.ai/blog/lora/).
165
+
166
+
For additional best practices on using LoRA effectively, refer to the [LoRA Without Regret](lora_without_regret) documentation.
167
+
61
168
## PEFT with Different Trainers
62
169
63
170
TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer.
@@ -69,19 +176,6 @@ TRL's trainers support PEFT configurations for various training paradigms. Below
69
176
70
177
The `SFTTrainer` is used for supervised fine-tuning on instruction datasets.
2. Use Flash Attention 2: `--attn_implementation flash_attention_2`
606
-
3. Use bf16: `--bf16`
607
-
4. Reduce gradient checkpointing frequency
694
+
-**[SFT with LoRA/QLoRA Notebook](https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb)** - Complete working example showing both LoRA and QLoRA implementations
695
+
-**[TRL Examples Directory](https://github.com/huggingface/trl/tree/main/examples)** - Collection of training scripts demonstrating PEFT with different trainers
696
+
-**[TRL Cookbook Recipes](https://github.com/huggingface/cookbook/tree/main/notebooks/transformers)** - Step-by-step guides for common PEFT training scenarios
608
697
698
+
### Documentation
609
699
700
+
-[PEFT Documentation](https://huggingface.co/docs/peft) - Official PEFT library documentation
701
+
-[TRL Documentation](https://huggingface.co/docs/trl) - Complete TRL documentation with trainer guides
702
+
-[LoRA Without Regret](lora_without_regret) - Best practices for using LoRA effectively
0 commit comments