Multi-GPU sampling for vLLM in GRPO Trainer #2706

nch0w · 2025-01-30T20:09:34Z

Feature request

It seems that the vLLM device can only be set in GRPOConfig.vllm_device, which is a string corresponding to a CUDA device identifier. I think this implies that the vLLM device can only use a single GPU, which can be a bottleneck for RL. It is also possible to use a subset by setting the CUDA_VISIBLE_DEVICE environment variable, but this might break TRL. Is there a more convenient way to specify multiple GPUs in a single node for training (or any hacks that would work now)? Furthermore, there might need to be more detailed configurations for multi-node vLLM/GPRO training runs.

Motivation

Enhance training efficiency for RL with >single GPU sampling.

Your contribution

N/A

Superskyyy · 2025-01-30T20:24:24Z

Just a question, on multi node with deepspeed does the current vllm-enabled trainer work?

If we want to specify where to host it the config needs to appoint a dedicated node for pure inference preferably. This is consistent with those disaggregated training infras.

github-actions bot added ✨ enhancement New feature or request 🏋 GRPO Related to GRPO labels Jan 30, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-GPU sampling for vLLM in GRPO Trainer #2706

Multi-GPU sampling for vLLM in GRPO Trainer #2706

nch0w commented Jan 30, 2025

Superskyyy commented Jan 30, 2025 •

edited

Loading

Multi-GPU sampling for vLLM in GRPO Trainer #2706

Multi-GPU sampling for vLLM in GRPO Trainer #2706

Comments

nch0w commented Jan 30, 2025

Feature request

Motivation

Your contribution

Superskyyy commented Jan 30, 2025 • edited Loading

Superskyyy commented Jan 30, 2025 •

edited

Loading