[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE #15831

ruisearch42 · 2025-03-31T17:35:25Z

Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE instead of envs.VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL.

Since ray 2.42 , the behavior of using with_tensor_transport(transport='auto') is that Compiled Graph will use NCCL if both the sender and receiver are using GPU (instead of CPU), which is almost always the case for vLLM. This PR allows overwriting the env var to use shared memory instead, which will be useful for debugging purposes. This also opens up opportunities to support additional channel types in future, say for different hardware.

The original VLLM_USE_RAY_COMPILED_DAG_NCCL_CHANNEL is rarely changed, so backward compatibility should not be a concern.

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

github-actions · 2025-03-31T17:35:36Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vllm/envs.py

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

vllm/executor/ray_distributed_executor.py

comaniac · 2025-03-31T22:26:15Z

vllm/envs.py

+    # - "auto": use the default channel type
+    # - "nccl": use NCCL for communication
+    # - "shm": use shared memory and gRPC for communication
+    # This flag is ignored if VLLM_USE_RAY_COMPILED_DAG is not set.


IIRC we enable CG in v1 by default? If so we should make it clear here.

Yeah, this statement is technically correct :) In V1 with ray backend, we set VLLM_USE_RAY_COMPILED_DAG to 1.
I added a comment at the definition of VLLM_USE_RAY_COMPILED_DAG.

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

…15831) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xinyuxiao <xinyuxiao2024@gmail.com>

…15831) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>

…15831) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk>

…15831) Signed-off-by: Rui Qiao <ruisearch42@gmail.com> Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Co-authored-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Mu Huai <tianbowen.tbw@antgroup.com>

[Misc] Use env.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE

982a152

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

ruisearch42 assigned simon-mo and youkaichao Mar 31, 2025

simon-mo approved these changes Mar 31, 2025

View reviewed changes

vllm/envs.py Outdated Show resolved Hide resolved

ruisearch42 added 3 commits March 31, 2025 17:49

up

24520cd

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

up

e29c0c9

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

up

e241be3

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

comaniac approved these changes Mar 31, 2025

View reviewed changes

ruisearch42 and others added 2 commits March 31, 2025 22:51

Update vllm/executor/ray_distributed_executor.py

0f17c58

Co-authored-by: Cody Yu <hao.yu.cody@gmail.com> Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

up

89c563e

Signed-off-by: Rui Qiao <ruisearch42@gmail.com>

ruisearch42 force-pushed the channel_type branch from 6e12e19 to 89c563e Compare March 31, 2025 22:56

comaniac added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 31, 2025

Merge branch 'main' into channel_type

4781fd6

vllm-bot merged commit 8dd41d6 into vllm-project:main Apr 1, 2025
31 of 34 checks passed

ckhordiasma mentioned this pull request Apr 17, 2025

[do not merge] pr test for nm changes into 2.20 red-hat-data-services/vllm#107

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE #15831

[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE #15831

Uh oh!

ruisearch42 commented Mar 31, 2025 •

edited by github-actions bot

Loading

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

Uh oh!

Uh oh!

comaniac Mar 31, 2025

Uh oh!

ruisearch42 Mar 31, 2025 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE #15831

[Misc] Use envs.VLLM_USE_RAY_COMPILED_DAG_CHANNEL_TYPE #15831

Uh oh!

Conversation

ruisearch42 commented Mar 31, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2025

Uh oh!

Uh oh!

Uh oh!

comaniac Mar 31, 2025

Choose a reason for hiding this comment

Uh oh!

ruisearch42 Mar 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ruisearch42 commented Mar 31, 2025 •

edited by github-actions bot

Loading

ruisearch42 Mar 31, 2025 •

edited

Loading