Skip to content

Conversation

@njhill
Copy link
Member

@njhill njhill commented May 22, 2025

The current DP engine logic assumes that the step() function will always perform a forward pass when there are any running and/or waiting requests. With the new async connector changes this might no longer be the case, specifically when there's only requests in WAITING_FOR_REMOTE_KV state.

These changes adjust the step() function to return a bool indicating whether a forward pass ran, and this is used to determine whether a dummy batch needs to run.

The set_forward_context function is also updated to avoid performing an all-reduce of DP metadata when it's not used in the context of a forward-pass. This is a minimally-invasive fix, but we should probably adjust the connector API to not use the forward context.

Thanks to @wseaton for related testing/experimentation and helping to figure out what changes were needed for this.

Signed-off-by: Nick Hill <nhill@redhat.com>
@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

@mergify mergify bot added the v1 label May 22, 2025
njhill added 2 commits May 22, 2025 16:23
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
@njhill njhill marked this pull request as ready for review May 27, 2025 21:40
Signed-off-by: Nick Hill <nhill@redhat.com>

Co-authored-by: Will Eaton <weaton@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
@njhill njhill force-pushed the fix-dp-with-delayed-req branch from 3b60a20 to 462d7c1 Compare May 27, 2025 23:52
Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we add the bool to EngineCoreOutputs?

@njhill
Copy link
Member Author

njhill commented May 28, 2025

can we add the bool to EngineCoreOutputs?

@simon-mo I guess the main reason for not doing that is that the api-server-scaleout PR will change this step() to return a dict of client index -> EngineCoreOutputs, and the bool flag indicating whether a forward pass happened is kind of separate from this.

@tlrmchlsmth tlrmchlsmth added the ready ONLY add when PR is ready to merge/full CI is needed label May 28, 2025
njhill added 2 commits May 28, 2025 15:49
Signed-off-by: Nick Hill <nhill@redhat.com>
@mergify
Copy link

mergify bot commented May 28, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @njhill.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label May 28, 2025
@tlrmchlsmth tlrmchlsmth enabled auto-merge (squash) May 29, 2025 14:32
@tlrmchlsmth tlrmchlsmth merged commit d1d61f3 into vllm-project:main May 29, 2025
63 of 66 checks passed
@njhill njhill deleted the fix-dp-with-delayed-req branch May 29, 2025 22:11
amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025
…ct#18559)

Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Will Eaton <weaton@redhat.com>
Signed-off-by: amit <amit.man@gmail.com>
amitm02 pushed a commit to amitm02/vllm that referenced this pull request Jun 1, 2025
…ct#18559)

Signed-off-by: Nick Hill <nhill@redhat.com>
Co-authored-by: Will Eaton <weaton@redhat.com>
Signed-off-by: amit <amit.man@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants