-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[Bugfix][Core]Fix block table out-of-range issue in priority scheduling #26661
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
这个 PR 旨在修复一个抢占调度中的 bug,即当一个已调度的请求被抢占时,相关的资源没有被正确回收。您正确地识别出需要从 req_to_new_blocks 和 num_scheduled_tokens 中移除被抢占的请求。然而,在归还 token_budget 时存在一个错误:代码错误地使用了当前请求的 ID (request.request_id) 而不是被抢占请求的 ID (preempted_req.request_id)。这会导致 KeyError,因为当前请求尚未被调度。我提供了一个修复建议。请查看具体的审查评论。
vllm/v1/core/sched/scheduler.py
Outdated
| token_budget += num_scheduled_tokens[request.request_id] | ||
| req_to_new_blocks.pop(preempted_req.request_id) | ||
| num_scheduled_tokens.pop(preempted_req.request_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
在归还 token_budget 时存在一个错误。代码使用了 request.request_id 来查找已调度的 token 数量,但这里的 request 是指当前正在调度并触发抢占的请求,而不是被抢占的 preempted_req。这会导致 KeyError,因为当前的 request 还没有被添加到 num_scheduled_tokens 字典中。您应该使用 preempted_req.request_id。
| token_budget += num_scheduled_tokens[request.request_id] | |
| req_to_new_blocks.pop(preempted_req.request_id) | |
| num_scheduled_tokens.pop(preempted_req.request_id) | |
| token_budget += num_scheduled_tokens[preempted_req.request_id] | |
| req_to_new_blocks.pop(preempted_req.request_id) | |
| num_scheduled_tokens.pop(preempted_req.request_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
| if preempted_req in scheduled_running_reqs: | ||
| scheduled_running_reqs.remove(preempted_req) | ||
| token_budget += num_scheduled_tokens[request.request_id] | ||
| req_to_new_blocks.pop(preempted_req.request_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refund tokens for preempted request instead of current request
When a previously scheduled request is preempted, the code attempts to refund its token budget with token_budget += num_scheduled_tokens[request.request_id]. Here request is the request currently being scheduled and has not been inserted into num_scheduled_tokens, so the dictionary lookup will raise a KeyError the first time a scheduled low‑priority request is preempted. Even if the key existed by coincidence, the refunded amount would be for the wrong request and the scheduler’s token accounting would be corrupted. This line should reference preempted_req.request_id instead.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
…uled_tokens Signed-off-by: quanliu <18646313696@163.com>
heheda12345
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com> Signed-off-by: 1994 <1994@users.noreply.github.com>
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com> Signed-off-by: bbartels <benjamin@bartels.dev>
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com>
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com>
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…ng (vllm-project#26661) Signed-off-by: quanliu <18646313696@163.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
This PR fixes a block table out-of-range error that occurs when using priority scheduling
When a low-priority request that has already been added to scheduled_running_reqs is preempted,
the corresponding entries in req_to_new_blocks and num_scheduled_tokens are not updated synchronously.
As a result, when the same request is later re-scheduled and added back to scheduled_running_reqs,
the block table allocation does not start from index 0, leading to an index out-of-range error.
This PR ensures that:
When a request is preempted, related entries in req_to_new_blocks and num_scheduled_tokens are cleared or resynchronized.
Subsequent scheduling will always allocate new block IDs starting from 0.
Fixes #23316
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.