Skip to content

Conversation

@zhuohan123
Copy link
Member

@zhuohan123 zhuohan123 commented Oct 25, 2025

…n (fix DP slow startup time &c) (#26709)"

This reverts commit 237cf6d.

Purpose

Revert #26709.

#26709 is breaking external launcher mode DP. We will add more tests to make sure it is catched in CI. Also in general I am confused on why this PR is written in the way that we need to hack the local dp rank when initializing devices.

Test Plan

Existing tests should pass and will add a external launcher mode PR.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

…n (fix DP slow startup time &c) (#26709)"

This reverts commit 237cf6d.
@zhuohan123 zhuohan123 added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 25, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request reverts a previous commit that aimed to optimize data parallel startup time by removing the use of CUDA_VISIBLE_DEVICES. As noted in the description, that change introduced a regression, breaking the external launcher mode for data parallelism. The revert restores the use of CUDA_VISIBLE_DEVICES for device selection and removes the manual calculation of device ranks in worker initialization. The changes are consistent across all modified files and appear to be a correct and necessary step to fix the regression. I have not found any issues of high or critical severity in this revert.

@zhuohan123 zhuohan123 enabled auto-merge (squash) October 25, 2025 04:16
@zhuohan123 zhuohan123 merged commit 56ed760 into main Oct 25, 2025
59 of 61 checks passed
@zhuohan123 zhuohan123 deleted the zhuohan/revert-26709-correct branch October 25, 2025 05:31
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
vllm-project#27502)

Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
vllm-project#27502)

Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants