[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block #22394

yaochengji · 2025-08-06T21:24:36Z

Essential Elements of an Effective PR Description Checklist

[ x] The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
[ x] The test plan, such as providing test command.
[x ] The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.

Purpose

kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block

Test Plan

pytest -s -v tests/v1/tpu/test_kv_cache_update_kernel.py

Test Result

passed

(Optional) Documentation Update

…ces_per_block Signed-off-by: Chengji Yao <chengjiyao@gmail.com>

gemini-code-assist · 2025-08-06T21:24:42Z

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

github-actions · 2025-08-06T21:24:55Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

vanbasten23 · 2025-08-07T17:51:19Z

I remember before the fix, it outputs something like

+-----------+
| Anomalies |
+-----------+
---------
Anomalies
---------
+-----------+
| Anomalies |
+-----------+
---------
Anomalies
---------

Do you know where the output is from?

vanbasten23 · 2025-08-07T17:57:15Z

vllm/attention/ops/pallas_kv_cache_update.py


 def _kv_cache_update_kernel(
    # Prefetch
    slices_ref,  # [3, padded_num_slices], list of (kv_cache_start,


if we don't pad the num_slices, is slices_ref.shape[1] sufficient so that you don't need num_slices_ref?

We still need pad to avoid recompilation, but don't need to pad to multiple of num_slices_per_block

vanbasten23 · 2025-08-07T18:01:48Z

vllm/v1/worker/tpu_model_runner.py

-def _get_padded_num_kv_cache_update_slices(
-        num_tokens: int, max_num_reqs: int, page_size: int,
-        num_slices_per_kv_cache_update_block: int) -> int:
+def _get_padded_num_kv_cache_update_slices(num_tokens: int, max_num_reqs: int,


nit: remove "padded" from the function name, variable names, and comments since we don't need to pad the num_slices anymore?

As I replied in the previous comment, we still need to pad it.

vanbasten23 · 2025-08-07T18:02:47Z

vllm/v1/worker/tpu_model_runner.py

-        num_tokens: int, max_num_reqs: int, page_size: int,
-        num_slices_per_kv_cache_update_block: int) -> int:
+def _get_padded_num_kv_cache_update_slices(num_tokens: int, max_num_reqs: int,
+                                           page_size: int) -> int:


nit: one more ask, could you add #19928 (comment) as comment here?

Sure, it's added.

Signed-off-by: Chengji Yao <chengjiyao@gmail.com>

yaochengji · 2025-08-07T19:59:25Z

I remember before the fix, it outputs something like
+-----------+
| Anomalies |
+-----------+
---------
Anomalies
---------
+-----------+
| Anomalies |
+-----------+
---------
Anomalies
---------
Do you know where the output is from?

@vanbasten23 it's from the XLA execution. Changing the logic in the kernel can prevent such out-of-index error during execution.

…iple of num_slices_per_block (vllm-project#22394) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com> Signed-off-by: Paul Pak <paulpak58@gmail.com>

…iple of num_slices_per_block (vllm-project#22394) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com> Signed-off-by: Diego-Castan <diego.castan@ibm.com>

…iple of num_slices_per_block (vllm-project#22394) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com>

…iple of num_slices_per_block (vllm-project#22394) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com> Signed-off-by: Xiao Yu <xiao.yu@amd.com>

…iple of num_slices_per_block (vllm-project#22394) Signed-off-by: Chengji Yao <chengjiyao@gmail.com> Co-authored-by: Chengji Yao <chengjiyao@gmail.com>

[TPU] kv cache update doesn't need to be padded o multiple of num_sli…

0880f22

…ces_per_block Signed-off-by: Chengji Yao <chengjiyao@gmail.com>

yaochengji requested review from WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners August 6, 2025 21:24

mergify bot added v1 tpu Related to Google TPUs labels Aug 6, 2025

yaochengji requested review from mgoin and vanbasten23 August 7, 2025 00:43

vanbasten23 reviewed Aug 7, 2025

View reviewed changes

vanbasten23 approved these changes Aug 7, 2025

View reviewed changes

add comment

39b09f6

Signed-off-by: Chengji Yao <chengjiyao@gmail.com>

simon-mo approved these changes Aug 8, 2025

View reviewed changes

simon-mo enabled auto-merge (squash) August 8, 2025 23:05

github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 8, 2025

vllm-bot merged commit 2a84fb4 into vllm-project:main Aug 10, 2025
45 of 51 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block #22394

[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block #22394

Uh oh!

yaochengji commented Aug 6, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

vanbasten23 commented Aug 7, 2025

Uh oh!

vanbasten23 Aug 7, 2025

Uh oh!

yaochengji Aug 7, 2025

Uh oh!

vanbasten23 Aug 7, 2025

Uh oh!

yaochengji Aug 7, 2025

Uh oh!

vanbasten23 Aug 7, 2025

Uh oh!

yaochengji Aug 7, 2025

Uh oh!

yaochengji commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Uh oh!

[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block #22394

[TPU] kv cache update kernel doesn't need to be padded slices to multiple of num_slices_per_block #22394

Uh oh!

Conversation

yaochengji commented Aug 6, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Essential Elements of an Effective PR Description Checklist

Purpose

Test Plan

Test Result

(Optional) Documentation Update

Uh oh!

gemini-code-assist bot commented Aug 6, 2025

Uh oh!

github-actions bot commented Aug 6, 2025

Uh oh!

vanbasten23 commented Aug 7, 2025

Uh oh!

vanbasten23 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

yaochengji Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

vanbasten23 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

yaochengji Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

vanbasten23 Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

yaochengji Aug 7, 2025

Choose a reason for hiding this comment

Uh oh!

yaochengji commented Aug 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yaochengji commented Aug 6, 2025 •

edited by github-actions bot

Loading