-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Bugfix] Do not crash V0 engine on input errors #13101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bugfix] Do not crash V0 engine on input errors #13101
Conversation
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @joerunde and sorry for taking so long to properly digest/review this.
Given the state of v0 code, this looks reasonable to me :)
vllm/worker/model_runner.py
Outdated
| except Exception as e: | ||
| # Raise an exception that tracks the ID of the bad request | ||
| raise InputProcessingError(seq_group_metadata.request_id, | ||
| str(e)) from e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about moving this try/except to the place where this method is called, here, so that it can contain that one line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that does sound like a better idea
|
Probably worth merging in latest main now anyhow, which will kick off all the tests again. |
Hah, reading that as: "This code may be rough but it's not worse than the surroundings" |
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com> Signed-off-by: Louis Ulmer <ulmerlouis@gmail.com>
Signed-off-by: Joe Runde <Joseph.Runde@ibm.com>
This PR adds exception handling to the V0 engine that catches errors processing the input for a sequence group, and removes that sequence group from the current batch instead of crashing the entire engine. We see this a lot when processing invalid image data for multimodal models, the mllama models are particularly bad offenders.
With this change, users will receive a 400 instead of a 500 when their request fails input processing in the engine, and other running requests will continue to process instead of also receiving 500s.