Updated TRL integration docs

sergiopaniego · sergiopaniego · commit 303251a451d5 · 2025-09-29T15:44:21.000+02:00
Signed-off-by: sergiopaniego &lt;sergiopaniegoblanco@gmail.com&gt;
diff --git a/docs/training/trl.md b/docs/training/trl.md
@@ -1,6 +1,6 @@
 # Transformers Reinforcement Learning
 
-Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
+[Transformers Reinforcement Learning](https://huggingface.co/docs/trl) (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
 
 Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
 
@@ -21,9 +21,9 @@ To enable vLLM in TRL, set the `use_vllm` flag in the trainer configuration to `
 
 Some trainers also support **vLLM sleep mode**, which offloads parameters and caches to GPU RAM during training, helping reduce memory usage. Learn more in the [memory optimization docs](https://huggingface.co/docs/trl/main/en/reducing_memory_usage#vllm-sleep-mode).
 
-
 !!! info
     For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
 
     - [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
     - [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
+    - [`trl.RLOOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/rloo_trainer#trl.RLOOConfig.use_vllm)