Skip to content

Conversation

@LCAIZJ
Copy link
Contributor

@LCAIZJ LCAIZJ commented Nov 29, 2025

What this PR does / why we need it?

Fix kvpool precision synchronization
Issue #4412

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM Ascend project. The following points will speed up your PR merge:‌‌

  • A PR should do only one thing, smaller PRs enable faster reviews.
  • Every PR should include unit tests and end-to-end tests ‌to ensure it works and is not broken by other future PRs.
  • Write the commit message by fulfilling the PR description to help reviewer and future developers understand.

If CI fails, you can run linting and testing checks locally according Contributing and Testing.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses a critical race condition related to KV cache saving. By moving the trigger for the save operation from execute_model to sample_tokens, it ensures that the KV cache is fully computed before being saved, preventing data corruption. Additionally, redundant and ineffective synchronization calls have been removed from the KV transfer threads, improving code clarity. The changes are correct and significantly improve the robustness of the KV pooling mechanism.

# tokens on the CPU, so they are run after bookkeeping.
propose_draft_token_ids(valid_sampled_token_ids)

self.maybe_wait_for_kv_save()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

Moving self.maybe_wait_for_kv_save() to this location from execute_model is a critical fix for a race condition. Previously, the KV cache save operation could be triggered before the model's forward pass had completed, potentially leading to corrupted data being saved. By placing it here, after sampling operations that implicitly synchronize the device, we ensure the KV cache is fully populated and stable before initiating the save.

A minor suggestion for future improvement: the method name maybe_wait_for_kv_save is misleading as it appears to trigger an asynchronous save rather than waiting. Renaming it to something like trigger_kv_save_if_needed would improve code clarity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

addr_list_tp = addr_list[self.tp_rank % self.put_step::self.put_step]
size_list_tp = size_list[self.tp_rank % self.put_step::self.put_step]
if key_list_tp:
torch.npu.current_stream().synchronize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The removal of torch.npu.current_stream().synchronize() is correct. This call was ineffective for synchronizing with the main model execution stream where the KV cache is produced, as it only synchronizes operations within the current thread's stream. Since there were no preceding NPU operations on this stream, the call was a no-op. The actual fix for the synchronization race condition is handled elsewhere by moving when the save operation is triggered. Removing this redundant synchronize() call cleans up the code.

addr_list_tp = addr_list[self.tp_rank % self.put_step::self.put_step]
size_list_tp = size_list[self.tp_rank % self.put_step::self.put_step]
if key_list_tp:
torch.npu.current_stream().synchronize()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Similar to the change in KVCacheStoreSendingThread, removing torch.npu.current_stream().synchronize() here is correct. The call was redundant and did not provide the necessary cross-stream synchronization. This change improves code clarity.

@LCAIZJ LCAIZJ changed the title [KVPool ] Solve precision synchronization [bugfix] Solve kvpool precision synchronization Nov 29, 2025
@LCAIZJ LCAIZJ changed the title [bugfix] Solve kvpool precision synchronization [Bugfix] Fix kvpool precision synchronization Nov 29, 2025
@wangxiyuan
Copy link
Collaborator

please fix the merge conflict

@github-actions
Copy link

This pull request has conflicts, please resolve those before we can evaluate the pull request.

Co-authored-by: baxingpiaochong <771405853@qq.com>

Signed-off-by: LCAIZJ <leichao139636@163.com>
@LCAIZJ
Copy link
Contributor Author

LCAIZJ commented Nov 29, 2025

please fix the merge conflict

It’s working now.

Signed-off-by: LCAIZJ <leichao139636@163.com>
@wangxiyuan wangxiyuan merged commit ff70613 into vllm-project:main Nov 30, 2025
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants