Apply suggestions from code review

hmellor · sergiopaniego · commit 3f64ade1786d · 2025-09-29T15:58:58.000+02:00
Signed-off-by: Harry Mellor &lt;19981378+hmellor@users.noreply.github.com&gt;
Signed-off-by: sergiopaniego &lt;sergiopaniegoblanco@gmail.com&gt;
diff --git a/docs/training/trl.md b/docs/training/trl.md
@@ -4,9 +4,26 @@ Transformers Reinforcement Learning (TRL) is a full stack library that provides
 
 Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
 
-See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.
+See the [vLLM guide integration](https://huggingface.co/docs/trl/main/en/vllm_integration) in the TRL documentation for more information.
+
+TRL currently supports the following online trainers with vLLM:
+
+- [GRPO](https://huggingface.co/docs/trl/main/en/grpo_trainer)
+- [Online DPO](https://huggingface.co/docs/trl/main/en/online_dpo_trainer)
+- [RLOO](https://huggingface.co/docs/trl/main/en/rloo_trainer)
+- [Nash-MD](https://huggingface.co/docs/trl/main/en/nash_md_trainer)
+- [XPOTrainer](https://huggingface.co/docs/trl/main/en/xpo_trainer)
+
+To enable vLLM in TRL, set the `use_vllm` flag in the trainer configuration to `True`. You can control how vLLM operates during training with the `vllm_mode` parameter, which supports two modes:
+
+1. **Server mode**
+2. **Colocate mode**
+
+Some trainers also support **vLLM sleep mode**, which offloads parameters and caches to GPU RAM during training, helping reduce memory usage. Learn more in the [memory optimization docs](https://huggingface.co/docs/trl/main/en/reducing_memory_usage#vllm-sleep-mode).
+
 
 !!! info
     For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
+
     - [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
     - [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)