Skip to content

Conversation

@quanliu1991
Copy link
Contributor

@quanliu1991 quanliu1991 commented Oct 12, 2025

Purpose

This PR fixes a block table out-of-range error that occurs when using priority scheduling

When a low-priority request that has already been added to scheduled_running_reqs is preempted,
the corresponding entries in req_to_new_blocks and num_scheduled_tokens are not updated synchronously.

As a result, when the same request is later re-scheduled and added back to scheduled_running_reqs,
the block table allocation does not start from index 0, leading to an index out-of-range error.

This PR ensures that:

When a request is preempted, related entries in req_to_new_blocks and num_scheduled_tokens are cleared or resynchronized.

Subsequent scheduling will always allocate new block IDs starting from 0.

Fixes #23316


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

这个 PR 旨在修复一个抢占调度中的 bug,即当一个已调度的请求被抢占时,相关的资源没有被正确回收。您正确地识别出需要从 req_to_new_blocksnum_scheduled_tokens 中移除被抢占的请求。然而,在归还 token_budget 时存在一个错误:代码错误地使用了当前请求的 ID (request.request_id) 而不是被抢占请求的 ID (preempted_req.request_id)。这会导致 KeyError,因为当前请求尚未被调度。我提供了一个修复建议。请查看具体的审查评论。

Comment on lines 276 to 278
token_budget += num_scheduled_tokens[request.request_id]
req_to_new_blocks.pop(preempted_req.request_id)
num_scheduled_tokens.pop(preempted_req.request_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

在归还 token_budget 时存在一个错误。代码使用了 request.request_id 来查找已调度的 token 数量,但这里的 request 是指当前正在调度并触发抢占的请求,而不是被抢占的 preempted_req。这会导致 KeyError,因为当前的 request 还没有被添加到 num_scheduled_tokens 字典中。您应该使用 preempted_req.request_id

Suggested change
token_budget += num_scheduled_tokens[request.request_id]
req_to_new_blocks.pop(preempted_req.request_id)
num_scheduled_tokens.pop(preempted_req.request_id)
token_budget += num_scheduled_tokens[preempted_req.request_id]
req_to_new_blocks.pop(preempted_req.request_id)
num_scheduled_tokens.pop(preempted_req.request_id)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Comment on lines 274 to 277
if preempted_req in scheduled_running_reqs:
scheduled_running_reqs.remove(preempted_req)
token_budget += num_scheduled_tokens[request.request_id]
req_to_new_blocks.pop(preempted_req.request_id)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P0 Badge Refund tokens for preempted request instead of current request

When a previously scheduled request is preempted, the code attempts to refund its token budget with token_budget += num_scheduled_tokens[request.request_id]. Here request is the request currently being scheduled and has not been inserted into num_scheduled_tokens, so the dictionary lookup will raise a KeyError the first time a scheduled low‑priority request is preempted. Even if the key existed by coincidence, the refunded amount would be for the wrong request and the scheduler’s token accounting would be corrupted. This line should reference preempted_req.request_id instead.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@quanliu1991 quanliu1991 changed the title [Bugfix][Core]remove preempted_req in req_to_new_blocks and num_sched… [Bugfix][Core]Fix block table out-of-range issue in priority scheduling Oct 12, 2025
…uled_tokens

Signed-off-by: quanliu <18646313696@163.com>
Copy link
Collaborator

@heheda12345 heheda12345 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@heheda12345 heheda12345 enabled auto-merge (squash) October 12, 2025 23:33
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Oct 12, 2025
@heheda12345 heheda12345 merged commit 41f3884 into vllm-project:main Oct 13, 2025
46 checks passed
1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025
…ng (vllm-project#26661)

Signed-off-by: quanliu <18646313696@163.com>
Signed-off-by: 1994 <1994@users.noreply.github.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
…ng (vllm-project#26661)

Signed-off-by: quanliu <18646313696@163.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
…ng (vllm-project#26661)

Signed-off-by: quanliu <18646313696@163.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…ng (vllm-project#26661)

Signed-off-by: quanliu <18646313696@163.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…ng (vllm-project#26661)

Signed-off-by: quanliu <18646313696@163.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…ng (vllm-project#26661)

Signed-off-by: quanliu <18646313696@163.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
…ng (vllm-project#26661)

Signed-off-by: quanliu <18646313696@163.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: [BlockTable error] DeepSeek R1 (TP=8, PP=2) will appear when input is 16k and concurrent 128

2 participants