-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[BugFix] Make DP work with connector-delayed new requests #18559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BugFix] Make DP work with connector-delayed new requests #18559
Conversation
Signed-off-by: Nick Hill <nhill@redhat.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com>
Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com> Signed-off-by: Nick Hill <nhill@redhat.com>
3b60a20 to
462d7c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we add the bool to EngineCoreOutputs?
@simon-mo I guess the main reason for not doing that is that the api-server-scaleout PR will change this |
|
This pull request has merge conflicts that must be resolved before it can be |
# Conflicts: # vllm/forward_context.py
Signed-off-by: Nick Hill <nhill@redhat.com>
…ct#18559) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com> Signed-off-by: amit <amit.man@gmail.com>
…ct#18559) Signed-off-by: Nick Hill <nhill@redhat.com> Co-authored-by: Will Eaton <weaton@redhat.com> Signed-off-by: amit <amit.man@gmail.com>
The current DP engine logic assumes that the step() function will always perform a forward pass when there are any running and/or waiting requests. With the new async connector changes this might no longer be the case, specifically when there's only requests in
WAITING_FOR_REMOTE_KVstate.These changes adjust the step() function to return a bool indicating whether a forward pass ran, and this is used to determine whether a dummy batch needs to run.
The
set_forward_contextfunction is also updated to avoid performing an all-reduce of DP metadata when it's not used in the context of a forward-pass. This is a minimally-invasive fix, but we should probably adjust the connector API to not use the forward context.Thanks to @wseaton for related testing/experimentation and helping to figure out what changes were needed for this.