You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
docs: Address review feedback on PEFT integration guide
Applied all requested changes from PR review:
1. Added notebook reference link to example SFT LoRA/QLoRA notebook
2. Implemented hfoptions tabs to organize SFT/DPO/GRPO examples
3. Simplified Python code examples by removing non-PEFT boilerplate
The documentation now focuses more clearly on PEFT-specific configuration
while maintaining all essential information.
Copy file name to clipboardExpand all lines: docs/source/peft_integration.md
+25-80Lines changed: 25 additions & 80 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,8 @@ TRL supports [PEFT](https://github.com/huggingface/peft) (Parameter-Efficient Fi
4
4
5
5
This guide covers how to use PEFT with different TRL trainers, including LoRA, QLoRA, and prompt tuning techniques.
6
6
7
+
For a complete working example, see the [SFT with LoRA/QLoRA notebook](https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb).
8
+
7
9
## Installation
8
10
9
11
To use PEFT with TRL, install the required dependencies:
@@ -60,6 +62,9 @@ trainer = SFTTrainer(
60
62
61
63
TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer.
62
64
65
+
<hfoptionsid="trainer-type">
66
+
<hfoptionid="sft">
67
+
63
68
### Supervised Fine-Tuning (SFT)
64
69
65
70
The `SFTTrainer` is used for supervised fine-tuning on instruction datasets.
@@ -96,18 +101,9 @@ python trl/scripts/sft.py \
96
101
#### Python Example
97
102
98
103
```python
99
-
from datasets import load_dataset
100
-
from transformers import AutoModelForCausalLM, AutoTokenizer
101
104
from peft import LoraConfig
102
105
from trl import SFTConfig, SFTTrainer
103
106
104
-
# Load model and tokenizer
105
-
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B")
# When using PEFT, ref_model is automatically handled and set to None
177
+
# Create trainer with PEFT config
205
178
trainer = DPOTrainer(
206
179
model=model,
207
180
ref_model=None, # Not needed when using PEFT
208
181
args=training_args,
209
182
train_dataset=dataset,
210
-
peft_config=peft_config,
183
+
peft_config=peft_config,# Pass PEFT config here
211
184
)
212
185
213
-
# Train
214
186
trainer.train()
215
187
```
216
188
217
189
**Note:** When using PEFT with DPO, you don't need to provide a separate reference model (`ref_model`). The trainer automatically uses the frozen base model as the reference.
218
190
191
+
</hfoption>
192
+
<hfoptionid="grpo">
193
+
219
194
### Group Relative Policy Optimization (GRPO)
220
195
221
196
The `GRPOTrainer` optimizes policies using group-based rewards.
@@ -248,14 +223,9 @@ python trl/scripts/grpo.py \
248
223
#### Python Example
249
224
250
225
```python
251
-
from datasets import load_dataset
252
-
from transformers import AutoModelForCausalLM, AutoTokenizer
model="Qwen/Qwen2-0.5B", # Can pass model name or loaded model
278
241
args=training_args,
279
242
train_dataset=dataset,
280
-
peft_config=peft_config,
243
+
peft_config=peft_config,# Pass PEFT config here
281
244
)
282
245
283
-
# Train
284
246
trainer.train()
285
247
```
286
248
249
+
</hfoption>
250
+
</hfoptions>
251
+
287
252
## QLoRA: Quantized Low-Rank Adaptation
288
253
289
254
QLoRA combines 4-bit quantization with LoRA to enable fine-tuning of very large models on consumer hardware. This technique can reduce memory requirements by up to 4x compared to standard LoRA.
0 commit comments