Skip to content

Conversation

@wseaton
Copy link
Contributor

@wseaton wseaton commented May 21, 2025

Draft PR, testing with 1P1D where DP=2 for D

This is a stacked PR, it has merged in commits already present in: #18559

wseaton added 10 commits May 20, 2025 21:36
This reverts commit 637b92d.
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

wseaton added 2 commits May 21, 2025 12:57
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
@mergify mergify bot added the v1 label May 21, 2025
wseaton added 3 commits May 21, 2025 20:05
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
wseaton and others added 6 commits May 21, 2025 21:30
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
wseaton and others added 4 commits May 22, 2025 15:36
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
@mergify
Copy link

mergify bot commented May 23, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @wseaton.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 23, 2025
wseaton and others added 6 commits May 27, 2025 10:35
Signed-off-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
Co-authored-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Will Eaton <weaton@redhat.com>
@mergify mergify bot removed the needs-rebase label May 27, 2025
@wseaton
Copy link
Contributor Author

wseaton commented May 28, 2025

Once #18559 is merged I will pluck the relevant remaining changes into a clean branch.

Comment on lines +340 to +347
# NOTE(weaton): Hack to get rank uniqueness across TP + DP ranks.
# Used for NIXL side channel communication.
self.unique_rank = (get_dp_group().rank_in_group *
self.world_size) + self.rank
logger.debug(
"NIXL worker %s TP rank %s, DP local rank %s, unique rank %s",
self.engine_id, self.rank,
get_dp_group().local_rank, self.unique_rank)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of construction the unique rank, can we do self.world_rank = get_world_group().rank_in_group? This should be less hacky and right for PP as well

=======
return (engine_core_outputs,
scheduler_output.total_num_scheduled_tokens > 0)
>>>>>>> njhill/fix-dp-with-delayed-req
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some merge conflict markers to clean up here

Comment on lines +1471 to +1478
# The KVConnector API takes in a ForwardContext here because it might need it
# in certain cases, in the `NixlConnector` path we basically ignore it. A
# future note here is to consider changing this to Optional[ForwardContext].
# This work could also better be moved elsewhere.
#
# NOTE(weaton): Passing None here because getting the forward context
# is a blocking operation in the DP case.
kv_connector.start_load_kv(None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've basically dodged this issue with the forward context changes in #18559, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, #18559 is tested to fix this when this change is reverted, so all good

@wseaton wseaton closed this May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants