Skip to content

Commit 3f64ade

Browse files
hmellorsergiopaniego
authored andcommitted
Apply suggestions from code review
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com> Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
1 parent fafadb7 commit 3f64ade

File tree

1 file changed

+18
-1
lines changed

1 file changed

+18
-1
lines changed

docs/training/trl.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,9 +4,26 @@ Transformers Reinforcement Learning (TRL) is a full stack library that provides
44

55
Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
66

7-
See the guide [vLLM for fast generation in online methods](https://huggingface.co/docs/trl/main/en/speeding_up_training#vllm-for-fast-generation-in-online-methods) in the TRL documentation for more information.
7+
See the [vLLM guide integration](https://huggingface.co/docs/trl/main/en/vllm_integration) in the TRL documentation for more information.
8+
9+
TRL currently supports the following online trainers with vLLM:
10+
11+
- [GRPO](https://huggingface.co/docs/trl/main/en/grpo_trainer)
12+
- [Online DPO](https://huggingface.co/docs/trl/main/en/online_dpo_trainer)
13+
- [RLOO](https://huggingface.co/docs/trl/main/en/rloo_trainer)
14+
- [Nash-MD](https://huggingface.co/docs/trl/main/en/nash_md_trainer)
15+
- [XPOTrainer](https://huggingface.co/docs/trl/main/en/xpo_trainer)
16+
17+
To enable vLLM in TRL, set the `use_vllm` flag in the trainer configuration to `True`. You can control how vLLM operates during training with the `vllm_mode` parameter, which supports two modes:
18+
19+
1. **Server mode**
20+
2. **Colocate mode**
21+
22+
Some trainers also support **vLLM sleep mode**, which offloads parameters and caches to GPU RAM during training, helping reduce memory usage. Learn more in the [memory optimization docs](https://huggingface.co/docs/trl/main/en/reducing_memory_usage#vllm-sleep-mode).
23+
824

925
!!! info
1026
For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
27+
1128
- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
1229
- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)

0 commit comments

Comments
 (0)