Skip to content

Commit 303251a

Browse files
committed
Updated TRL integration docs
Signed-off-by: sergiopaniego <sergiopaniegoblanco@gmail.com>
1 parent e87da96 commit 303251a

File tree

1 file changed

+2
-2
lines changed

1 file changed

+2
-2
lines changed

docs/training/trl.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Transformers Reinforcement Learning
22

3-
Transformers Reinforcement Learning (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
3+
[Transformers Reinforcement Learning](https://huggingface.co/docs/trl) (TRL) is a full stack library that provides a set of tools to train transformer language models with methods like Supervised Fine-Tuning (SFT), Group Relative Policy Optimization (GRPO), Direct Preference Optimization (DPO), Reward Modeling, and more. The library is integrated with 🤗 transformers.
44

55
Online methods such as GRPO or Online DPO require the model to generate completions. vLLM can be used to generate these completions!
66

@@ -21,9 +21,9 @@ To enable vLLM in TRL, set the `use_vllm` flag in the trainer configuration to `
2121

2222
Some trainers also support **vLLM sleep mode**, which offloads parameters and caches to GPU RAM during training, helping reduce memory usage. Learn more in the [memory optimization docs](https://huggingface.co/docs/trl/main/en/reducing_memory_usage#vllm-sleep-mode).
2323

24-
2524
!!! info
2625
For more information on the `use_vllm` flag you can provide to the configs of these online methods, see:
2726

2827
- [`trl.GRPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/grpo_trainer#trl.GRPOConfig.use_vllm)
2928
- [`trl.OnlineDPOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/online_dpo_trainer#trl.OnlineDPOConfig.use_vllm)
29+
- [`trl.RLOOConfig.use_vllm`](https://huggingface.co/docs/trl/main/en/rloo_trainer#trl.RLOOConfig.use_vllm)

0 commit comments

Comments
 (0)