Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPU sampling for vLLM in GRPO Trainer #2706

Open
nch0w opened this issue Jan 30, 2025 · 1 comment
Open

Multi-GPU sampling for vLLM in GRPO Trainer #2706

nch0w opened this issue Jan 30, 2025 · 1 comment
Labels
✨ enhancement New feature or request 🏋 GRPO Related to GRPO

Comments

@nch0w
Copy link

nch0w commented Jan 30, 2025

Feature request

It seems that the vLLM device can only be set in GRPOConfig.vllm_device, which is a string corresponding to a CUDA device identifier. I think this implies that the vLLM device can only use a single GPU, which can be a bottleneck for RL. It is also possible to use a subset by setting the CUDA_VISIBLE_DEVICE environment variable, but this might break TRL. Is there a more convenient way to specify multiple GPUs in a single node for training (or any hacks that would work now)? Furthermore, there might need to be more detailed configurations for multi-node vLLM/GPRO training runs.

Motivation

Enhance training efficiency for RL with >single GPU sampling.

Your contribution

N/A

@github-actions github-actions bot added ✨ enhancement New feature or request 🏋 GRPO Related to GRPO labels Jan 30, 2025
@Superskyyy
Copy link
Contributor

Superskyyy commented Jan 30, 2025

Just a question, on multi node with deepspeed does the current vllm-enabled trainer work?

If we want to specify where to host it the config needs to appoint a dedicated node for pure inference preferably. This is consistent with those disaggregated training infras.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
✨ enhancement New feature or request 🏋 GRPO Related to GRPO
Projects
None yet
Development

No branches or pull requests

2 participants