docs: Address PR feedback for PEFT integration guide

behroozazarkhalili · behroozazarkhalili · commit cbe38d741412 · 2025-11-05T06:56:00.000-08:00
- Add comprehensive learning rate section with table and blog links
- Add learning_rate parameters to all code examples (SFT, DPO, GRPO, QLoRA, Prompt Tuning)
- Remove Full Training (No PEFT) sections for cleaner focus
- Remove Troubleshooting section as requested
- Document three methods of PEFT configuration (CLI, peft_config, get_peft_model)
- Enhance Resources section with TRL notebooks, examples, and Cookbook
- Simplify Python examples using ellipsis for non-PEFT configs
- Fix import order (standard library before third-party)
diff --git a/docs/source/peft_integration.md b/docs/source/peft_integration.md
@@ -49,6 +49,12 @@ peft_config = LoraConfig(
     task_type="CAUSAL_LM",
 )
 
+# Configure training - note the higher learning rate for LoRA (10x base rate)
+training_args = SFTConfig(
+    learning_rate=2.0e-4,  # 10x the base rate (2.0e-5) for LoRA
+    ...
+)
+
 # Create trainer with PEFT
 trainer = SFTTrainer(
     model=model,
@@ -58,6 +64,107 @@ trainer = SFTTrainer(
 )
 ```
 
+## Three Ways to Configure PEFT
+
+TRL provides three different methods to configure PEFT, each suited for different use cases:
+
+### 1. Using CLI Flags (Simplest)
+
+The easiest way to enable PEFT is using the `--use_peft` flag with the command-line interface. This method is ideal for quick experiments and standard configurations:
+
+```bash
+python trl/scripts/sft.py \
+    --model_name_or_path Qwen/Qwen2-0.5B \
+    --dataset_name trl-lib/Capybara \
+    --use_peft \
+    --lora_r 32 \
+    --lora_alpha 16 \
+    --lora_dropout 0.05 \
+    --output_dir Qwen2-0.5B-SFT-LoRA
+```
+
+**Pros**: Quick setup, no code required
+**Cons**: Limited to LoRA, fewer customization options
+
+### 2. Passing peft_config to Trainer (Recommended)
+
+For more control, pass a PEFT configuration directly to the trainer. This is the recommended approach for most use cases:
+
+```python
+from peft import LoraConfig
+from trl import SFTConfig, SFTTrainer
+
+peft_config = LoraConfig(
+    r=32,
+    lora_alpha=16,
+    lora_dropout=0.05,
+    bias="none",
+    task_type="CAUSAL_LM",
+    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
+)
+
+trainer = SFTTrainer(
+    model=model,
+    args=training_args,
+    train_dataset=dataset,
+    peft_config=peft_config,  # Pass config here
+)
+```
+
+**Pros**: Full control, supports all PEFT methods (LoRA, Prompt Tuning, etc.)
+**Cons**: Requires Python code
+
+### 3. Applying PEFT to Model Directly (Advanced)
+
+For maximum flexibility, you can apply PEFT to your model before passing it to the trainer:
+
+```python
+from peft import LoraConfig, get_peft_model
+from transformers import AutoModelForCausalLM
+from trl import SFTConfig, SFTTrainer
+
+# Load base model
+model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B")
+
+# Apply PEFT configuration
+peft_config = LoraConfig(
+    r=32,
+    lora_alpha=16,
+    lora_dropout=0.05,
+    bias="none",
+    task_type="CAUSAL_LM",
+)
+model = get_peft_model(model, peft_config)
+
+# Pass PEFT-wrapped model to trainer
+trainer = SFTTrainer(
+    model=model,  # Already has PEFT applied
+    args=training_args,
+    train_dataset=dataset,
+    # Note: no peft_config needed here
+)
+```
+
+**Pros**: Maximum control, useful for custom model architectures or complex setups
+**Cons**: More verbose, requires understanding of PEFT internals
+
+## Learning Rate Considerations
+
+When using LoRA or other PEFT methods, you typically need to use a **higher learning rate** (approximately 10x) compared to full fine-tuning. This is because PEFT methods train only a small fraction of parameters, requiring a larger learning rate to achieve similar parameter updates.
+
+**Recommended learning rates:**
+
+| Trainer | Full Fine-Tuning | With LoRA (10x) |
+|---------|------------------|-----------------|
+| **SFT** | `2.0e-5` | `2.0e-4` |
+| **DPO** | `5.0e-7` | `5.0e-6` |
+| **GRPO** | `1.0e-6` | `1.0e-5` |
+| **Prompt Tuning** | N/A | `1.0e-2` to `3.0e-2` |
+
+> **Why 10x?** LoRA adapters have significantly fewer trainable parameters than the full model. A higher learning rate compensates for this reduced parameter count, ensuring effective training. For detailed explanation, see [this blog post](https://thinkingmachines.ai/blog/lora/).
+
+For additional best practices on using LoRA effectively, refer to the [LoRA Without Regret](lora_without_regret) documentation.
+
 ## PEFT with Different Trainers
 
 TRL's trainers support PEFT configurations for various training paradigms. Below are detailed examples for each major trainer.
@@ -69,19 +176,6 @@ TRL's trainers support PEFT configurations for various training paradigms. Below
 
 The `SFTTrainer` is used for supervised fine-tuning on instruction datasets.
 
-#### Full Training (No PEFT)
-
-```bash
-python trl/scripts/sft.py \
-    --model_name_or_path Qwen/Qwen2-0.5B \
-    --dataset_name trl-lib/Capybara \
-    --learning_rate 2.0e-5 \
-    --num_train_epochs 1 \
-    --per_device_train_batch_size 2 \
-    --gradient_accumulation_steps 8 \
-    --output_dir Qwen2-0.5B-SFT
-```
-
 #### With LoRA
 
 ```bash
@@ -114,6 +208,12 @@ peft_config = LoraConfig(
     target_modules=["q_proj", "v_proj"],  # Optional: specify target modules
 )
 
+# Configure training with higher learning rate for LoRA
+training_args = SFTConfig(
+    learning_rate=2.0e-4,  # 10x the base rate for LoRA
+    ...
+)
+
 # Create trainer with PEFT config
 trainer = SFTTrainer(
     model=model,
@@ -132,18 +232,6 @@ trainer.train()
 
 The `DPOTrainer` implements preference learning from human feedback.
 
-#### Full Training (No PEFT)
-
-```bash
-python trl/scripts/dpo.py \
-    --model_name_or_path Qwen/Qwen2-0.5B-Instruct \
-    --dataset_name trl-lib/ultrafeedback_binarized \
-    --learning_rate 5.0e-7 \
-    --per_device_train_batch_size 2 \
-    --gradient_accumulation_steps 8 \
-    --output_dir Qwen2-0.5B-DPO
-```
-
 #### With LoRA
 
 ```bash
@@ -174,6 +262,12 @@ peft_config = LoraConfig(
     task_type="CAUSAL_LM",
 )
 
+# Configure training with higher learning rate for LoRA
+training_args = DPOConfig(
+    learning_rate=5.0e-6,  # 10x the base rate for DPO with LoRA
+    ...
+)
+
 # Create trainer with PEFT config
 trainer = DPOTrainer(
     model=model,
@@ -195,17 +289,6 @@ trainer.train()
 
 The `GRPOTrainer` optimizes policies using group-based rewards.
 
-#### Full Training (No PEFT)
-
-```bash
-python trl/scripts/grpo.py \
-    --model_name_or_path Qwen/Qwen2-0.5B \
-    --dataset_name trl-lib/math-reasoning \
-    --learning_rate 1.0e-6 \
-    --per_device_train_batch_size 2 \
-    --output_dir Qwen2-0.5B-GRPO
-```
-
 #### With LoRA
 
 ```bash
@@ -235,6 +318,12 @@ peft_config = LoraConfig(
     task_type="CAUSAL_LM",
 )
 
+# Configure training with higher learning rate for LoRA
+training_args = GRPOConfig(
+    learning_rate=1.0e-5,  # 10x the base rate for GRPO with LoRA
+    ...
+)
+
 # Create trainer with PEFT config
 trainer = GRPOTrainer(
     model="Qwen/Qwen2-0.5B",  # Can pass model name or loaded model
@@ -282,10 +371,11 @@ python trl/scripts/sft.py \
 #### Python Example
 
 ```python
-from transformers import AutoModelForCausalLM, BitsAndBytesConfig
+import torch
+
 from peft import LoraConfig
+from transformers import AutoModelForCausalLM, BitsAndBytesConfig
 from trl import SFTConfig, SFTTrainer
-import torch
 
 # Configure 4-bit quantization
 bnb_config = BitsAndBytesConfig(
@@ -311,6 +401,12 @@ peft_config = LoraConfig(
     task_type="CAUSAL_LM",
 )
 
+# Configure training with higher learning rate for LoRA
+training_args = SFTConfig(
+    learning_rate=2.0e-4,  # 10x the base rate for QLoRA
+    ...
+)
+
 # Create trainer with PEFT config
 trainer = SFTTrainer(
     model=model,
@@ -327,9 +423,10 @@ trainer.train()
 The `BitsAndBytesConfig` provides several options to optimize memory and performance:
 
 ```python
-from transformers import BitsAndBytesConfig
 import torch
 
+from transformers import BitsAndBytesConfig
+
 bnb_config = BitsAndBytesConfig(
     load_in_4bit=True,
     bnb_4bit_quant_type="nf4",  # or "fp4"
@@ -396,6 +493,12 @@ peft_config = PromptTuningConfig(
     tokenizer_name_or_path="Qwen/Qwen2-0.5B",
 )
 
+# Configure training with higher learning rate for Prompt Tuning
+training_args = SFTConfig(
+    learning_rate=2.0e-2,  # Prompt Tuning typically uses 1e-2 to 3e-2
+    ...
+)
+
 # Create trainer with PEFT config
 trainer = SFTTrainer(
     model=model,
@@ -584,34 +687,22 @@ accelerate launch trl/scripts/sft.py \
     --lora_r 32
 ```
 
-## Troubleshooting
-
-### Out of Memory Errors
-
-If you encounter OOM errors:
-
-1. Enable QLoRA: `--load_in_4bit`
-2. Reduce batch size: `--per_device_train_batch_size 1`
-3. Increase gradient accumulation: `--gradient_accumulation_steps 16`
-4. Enable gradient checkpointing: `--gradient_checkpointing`
-5. Reduce LoRA rank: `--lora_r 8`
-6. Reduce target modules: `--lora_target_modules q_proj v_proj`
-
-### Slow Training
+## Resources
 
-If training is slow:
+### TRL Examples and Notebooks
 
-1. Increase batch size (if memory allows)
-2. Use Flash Attention 2: `--attn_implementation flash_attention_2`
-3. Use bf16: `--bf16`
-4. Reduce gradient checkpointing frequency
+- **[SFT with LoRA/QLoRA Notebook](https://github.com/huggingface/trl/blob/main/examples/notebooks/sft_trl_lora_qlora.ipynb)** - Complete working example showing both LoRA and QLoRA implementations
+- **[TRL Examples Directory](https://github.com/huggingface/trl/tree/main/examples)** - Collection of training scripts demonstrating PEFT with different trainers
+- **[TRL Cookbook Recipes](https://github.com/huggingface/cookbook/tree/main/notebooks/transformers)** - Step-by-step guides for common PEFT training scenarios
 
+### Documentation
 
+- [PEFT Documentation](https://huggingface.co/docs/peft) - Official PEFT library documentation
+- [TRL Documentation](https://huggingface.co/docs/trl) - Complete TRL documentation with trainer guides
+- [LoRA Without Regret](lora_without_regret) - Best practices for using LoRA effectively
 
-## Resources
+### Research Papers
 
-- [PEFT Documentation](https://huggingface.co/docs/peft)
-- [LoRA Paper](https://arxiv.org/abs/2106.09685)
-- [QLoRA Paper](https://arxiv.org/abs/2305.14314)
-- [Prompt Tuning Paper](https://arxiv.org/abs/2104.08691)
-- [TRL Documentation](https://huggingface.co/docs/trl)
+- [LoRA Paper](https://arxiv.org/abs/2106.09685) - Original LoRA methodology and results
+- [QLoRA Paper](https://arxiv.org/abs/2305.14314) - Efficient finetuning with 4-bit quantization
+- [Prompt Tuning Paper](https://arxiv.org/abs/2104.08691) - The Power of Scale for Parameter-Efficient Prompt Tuning