You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that the vLLM device can only be set in GRPOConfig.vllm_device, which is a string corresponding to a CUDA device identifier. I think this implies that the vLLM device can only use a single GPU, which can be a bottleneck for RL. It is also possible to use a subset by setting the CUDA_VISIBLE_DEVICE environment variable, but this might break TRL. Is there a more convenient way to specify multiple GPUs in a single node for training (or any hacks that would work now)? Furthermore, there might need to be more detailed configurations for multi-node vLLM/GPRO training runs.
Motivation
Enhance training efficiency for RL with >single GPU sampling.
Your contribution
N/A
The text was updated successfully, but these errors were encountered:
Just a question, on multi node with deepspeed does the current vllm-enabled trainer work?
If we want to specify where to host it the config needs to appoint a dedicated node for pure inference preferably. This is consistent with those disaggregated training infras.
Feature request
It seems that the vLLM device can only be set in GRPOConfig.vllm_device, which is a string corresponding to a CUDA device identifier. I think this implies that the vLLM device can only use a single GPU, which can be a bottleneck for RL. It is also possible to use a subset by setting the CUDA_VISIBLE_DEVICE environment variable, but this might break TRL. Is there a more convenient way to specify multiple GPUs in a single node for training (or any hacks that would work now)? Furthermore, there might need to be more detailed configurations for multi-node vLLM/GPRO training runs.
Motivation
Enhance training efficiency for RL with >single GPU sampling.
Your contribution
N/A
The text was updated successfully, but these errors were encountered: